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The  following  results  have  been  obtained  by  the  contractors.  Since  £ill  of  thenx  are  con¬ 
cerned  with  the  theory  of  -  or  numerical  approximations  in  -  nonlinear  filtering,  we  first 
recall  briefly  what  the  problem  of  nonlinear  filtering  is. 

Let  {Xf,Yt\t  >  0}  be  a  pair  of  stochastic  processes  (for  the  s£ike  of  simplicity  of  the 
exposition,  all  the  processes  below  will  be  one-dimensional)  satisfying  : 


(0.1) 


Xt  =Xo  +  f  /(X.)  ds  +  f  giXs)  dW, 
Jo  Jo 

Yt  =  f\(X,)ds  +  aVt 
Jo 


where  { Vj;  t  >  0}  are  two  standard  Wiener  processes,  which  we  shall  mostly  suppose  to 
be  independent,  and  Xo  is  a  random  variable  independent  of  >  0}.  The  process 

{X«}  is  tmobserved.  We  observe  {Ki},  aind  seek  to  estimate  Xt,  given  the  information 
available  at  time  t,  i.e.  given  <«<<}. 

Note  that  the  choice  of  the  above  model  means  that  we  have  choosen  {Xj}  to  be  a 
continuous  Markov  process,  and  that  we  observe  : 


yt  =  h{Xt)  +  (Tit 

where  the  observation  noise,  is  a  white  noise.  The  assumption  that  the  observation 
noise  is  white  (i.e.  “^-correlated”)  is  crucial  :  it  is  the  only  case  which  can  be  solved 
(besides  those  which  can  be  reduced  to  that  one  via  appropriate  transformations).  The 
above  model  is  obtained  by  applying  the  transformation  : 

Yt=  f  y»  ds 
Jo 

The  reason  for  this  is  to  avoid  the  handling  of  generalized  processes  (white  noise  is 
not  a  process  in  the  ordinary  sense).  However,  in  the  case  where  the  observation  noise 
and  the  signal  {h(X|)}  are  independent,  the  theory  of  nonlinear  filtering  has  been  recently 
developped  in  the  white  noise  setting,  i.e.  using  {j/t}  (and  not  {Vi})  as  the  observed 
process,  see  Kalli2mpur-Karandikar  [7]. 
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Going  back  to  our  model,  the  "best”  estimate  of  any  function  of  the  unknown  r.v.  Xt 
say  4>{Xt)  based  on  is  the  conditional  mean  : 


E[<i>iXt)/yt] 


and  computing  that  quantity  for  “any"  function  reduces  to  computing  the  conditional  law 
of  Xi  given  yt-  Assuming  that  this  conditional  law  has  a  density  q{t,x),  it  is  well  known 
(see  e.g.  Pardoux  [13])  that  q(i,x)  =  (fj^p(t,x)  dx)  ^  p(t,x)  where  the  “unnormalized 
conditional  density”  p(t,x)  solves  the  following  stochastic  partial  differential  equation, 
called  Zakai’s  equation  : 

2  dtp(t,x)  =L*p(t,x)di  +  h{x)p{t,x)  dYt,t>0 

p{0,x)=po(x) 


where  po(^)  is  the  density  of  the  law  of  Xo,  and  L*,  the  “backward  generator”  of  Xt,  is 
the  adgoint  of  the  “forward  generator”: 


L  = 


Note  that  at  each  time  t,  p(t,  •)  is  a  (random)  fimction  of  i,  i.e.  a  (random)  element  of  an 
infinite  dimensional  space.  This  is  of  course  a  serious  problem  for  practical  algorithms. 

Let  us  now  describe  the  results  which  we  have  obtained  during  the  period  covered  by 
the  present  contract. 


1)  A  uniqueness  theorem  for  ZakaPs  equation 

It  is  of  interest  to  give  conditions  tmder  which  the  “unnormeilized  conditional  density”  is 
the  unique  solution  of  equation  (0.2),  within  a  certain  cleiss  of  processes.  Such  a  imiqueness 
result  is  obtained  by  M.  Chaleyat-Maurel,  D.  Michel,  E.  Pardoux  [3],  under  the  condition 
that  the  coeflBcients  /,  g  and  h  be  bounded  and  smooth  (they  are  allowed  at  each  time  t  to 
depend  on  the  past  of  the  observed  process  {V*},  which  is  very  important  for  applications 
in  stochastic  control);  the  two  Wiener  processes  {Wt}  and  {Fj}  in  (0.1)  are  allowed  to  be 
correlated. 

The  uniqueness  was  known  imtil  now  only  tmder  additional  restrictions,  either  the 
non  degeneracy  of  L,  or  the  independence  of  {Wt)  and  {Vj}.  Note  that  in  the  latter  case, 
uniqueness  is  known  to  hold  even  with  an  tmboimded  observation  function  h. 
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2.  Non  existence  of  a  finite  dimensional  optimal  filter 

We  noted  above  that  the  solution  of  Zakai’s  equation  takes  values  in  an  infinite  dimensional 
sp2tce.  It  could  in  fact  happen  that  the  solution  varies  only  in  a  finite  dimensional  subspace 
of  that  infinite  dimensional  space. 

This  is  indeed  the  case  in  all  the  situations  where  a  finite  dimensional  optimal  (or 
“exact”)  filter  is  known  to  exist,  i.e.  in  the  linear  case  and  in  a  few  extensions  of  the  linear 
case,  see  Haussmann-Pardoux  [6]. 

It  has  been  conjectured  at  the  beginning  of  this  decade,  and  then  rigorously  proved 
that  imder  some  mild  conditions  which  are  satisfied  in  most  nonlinear  situations,  there 
is  no  finite  dimensional  equation  driven  by  the  observation  such  that  the  conditional  law 
would  be  a  function  of  its  solution. 

This  means  essentially  that  the  solution  of  Zakai’s  equation  does  not  stay  in  any  finite 
dimensional  space,  or  that  its  probability  law  fills  in  the  ftmction  space  in  which  it  lives. 
Such  a  property  is  very  close  to  the  kind  of  properties  which  can  be  proved  for  (finite 
dimensional)  stochastic  differential  equations  via  the  Malliavin  calculus. 

Ocone  [4]  has  developped  a  Malliavin  calculus  analysis  of  stochastic  partial  differen¬ 
tial  equations  and  applied  it  to  nonlinear  filtering.  Ocone  and  Pardoux  [5]  improve  the 
application  to  nonlinear  filtering,  in  that  the  criterion  is  much  easier  to  check  on  practical 
examples. 

3.  Time  discretization  of  Zakai’s  equation 

The  research  reported  in  the  previous  section  shows  that,  in  most  cases,  there  is  no  chance 
that  the  solution  of  Zakai’s  equation  can  be  solved  by  means  of  a  finite  number  of  statistics 
built  from  the  observation  process.  Therefore,  it  is  worth  studying  numerical  methods 
for  the  approximate  solution  of  Zakai’s  equation,  i.e.  the  stochastic  partial  differential 
equation: 

dpt  —L*pt  dt  4-  hpt  dYt 

po  =P 

Moreover,  this  could  provide  a  reference  method  with  which  to  compare  some  approximate 
nonlinear  filters  such  as  those  described  in  the  next  section. 

Le  Gland  [9]  has  studied  the  problem  of  time-discretization.  The  idea  is  to  use  some 
kind  of  TVotter  product  formula  and  then  to  study  the  rate  of  convergence  with  respect  to 
the  time  step  At. 

Let  0  =  <0  <  •  •  •  <  <  •  •  •  <  =  r  be  a  uniform  partition  of  [0,  T\  with  time  step 

At. 

A  first  scheme  is  the  following  : 

Pi+i  =  exp{hAYi  -  F^Pi 

where  AVi  =  •"  >s  the  way  the  observation  process  is  used,  and  P^^  is  the  semi¬ 

group  with  generator  L*.  It  is  proved  that  : 


The  second  scheme  is: 

Pi+i  =  exp(h^|  -  -  ^h^)Pi 

where  ^  —  ^  '^7  observation 

process  is  used  (note  that^l  +  =  ATi)  ^At(P^t)  ^  perturbation,  depending  on  the 

function  h,  of  the  semi-group  With  a  suitable  choice  of  proved  that  : 

1p,(i) -p(<i,i)P  dx| 

The  interest  of  such  product  formulas  is  that  the  deterministic  part  and  the  stochastic 
p2irt  have  been  decoupled.  In  particular,  the  next  steps  (approximation  of  the  semi-group 
Pit.  and  space-discretization)  can  be  handled  quite  easily.  Moreover,  a  probabilistic  in¬ 
terpretation  is  available  for  the  two  numerical  schemes  described  above. 

4.  Nonlinear  filtering  with  high  signal-to-noise  ratio 

Since  the  optimal  filter  is  usually  infinite  dimensional,  it  is  of  practical  importance  to 
find  good  approximations  in  low  dimension  for  certain  classes  of  problems.  One  cIeiss  of 
this  kind  is  the  class  of  problems  with  high  signaJ-to-noise  ratio,  or  in  other  words  small 
observation  noise,  i.e.  a  in  (0.1)  is  “small”.  More  precisely,  we  axe  looking  for  approximate 
finite  dimensional  filters,  wich  have  a  good  behavior  as  <7  — f  0. 

This  problem  has  been  first  considered  by  Katzur,  Bobrovsky,  Schuss  [8]  in  the  case 
where  h  is  one-to-one.  In  that  case,  the  filtering  problem  becomes  trivial  as  <t  =  0,  i.e.  the 
process  {Xt}  is  completely  observed,  and  the  variance  of  the  conditional  law  is  zero. 

When  <7  >  0  is  small,  one  may  expect  that  the  variance  of  the  conditioned  law  is  small. 
Then,  if  Xt  =  E{Xt/yt),  Xt  and  Xt  are  close,  and  consequently  : 

fiXt)^fiXt)  +  f'{Xt)iXt-Xt) 

9{Xt)c^9{Xt) 

h{Xt)^hiXt)  +  h\Xt){Xt-Xt) 

But  if  we  replace  f{Xt),  9{Xt)  and  h{Xt)  in  (0.1)  by  their  above  approximations,  we 
tremsform  (0.1)  into  a  linear  filtering  problem,  which  has  a  finite  dimensional  solution, 
neimely  the  Kalman  filter  (which  is  the  extended  Kalmem  filter  for  (0.1)  ). 

The  above  considerations  tend  to  indicate  that  the  extended  Kalman  filter  (or  possibly 
other  types  of  approximate  filters)  might  give  good  results  as  a  is  small.  This  has  been 
precisely  formtilated  and  rigorously  proved  by  Picard  [15]  amd  then  by  Bensoussan  [1]  using 
a  simpler  argument. 

Some  new  results  have  been  obtained  more  recently. 
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4.a  Nonlinear  filtering  with  high  signal-to- noise  ratio  in  discrete  time 
Consider  a  discrete-time  version  of  (0.1),  i.e.  : 

Xk+i  =Xk  +  f(Xk)  At  +  y/At  Wk+i 

Note  that  we  are  in  the  case  g  =  k  linear.  If  we  let  -+  0  while  At  is  kept  fixed,  clearly 
there  is  no  reason  to  use  a  more  clever  filter  than  the  very  simple  estimate  ; 

Xk  =  h-^Yk 


i.e.  we  use  only  the  last  observation.  On  the  other  hand,  if  a  and  At  tend  to  zero  together, 
and  we  specify  a  relation  of  the  form  At  =  a°  {a  >  0),  one  may  expect  to  obtain  a  discrete 
time  analog  of  Picard’s  result,  i.e.  one  can  show  that  the  filter  : 

=  Xk  +  biXk)At  +  — (n-n  -  hXk+,) 

(T 


has  a  “good”  behavior  for  a.  At  small. 

This  has  been  show  in  Milheiro  [10].  This  result  will  be  very  usefid  for  numerical 
implementation  of  Picard’s  filters. 


4.b  Piecewise  linear  filtering  with  high  signal-to- noise  ratio 

Suppose  now  that  again  <r  is  small,  but  now  /  and  h  are  continuous  and  piecewise 
linear,  while  ^  =  1.  If  h  is  one-to-one,  we  can  apply  Picard’s  result.  But  we  are  specifically 
interested  in  the  case  where  h  is  not  one-to-one,  i.e.  for  example  : 


if  X  <  0 
if  I  >  0 


with  h+/i_  <  0.  Suppose  for  simplicity  that  /  =  0.  If  =  — h_,  clearly  the  conditional 
law  of  Xt  given  is  symetric  around  zero  and  there  is  no  way  to  get  a  really  good 
estimate  of  Xt.  On  the  other  hand,  if  ^  — h_,  for  er  =  0  Xt  is  completely  observed 
from  {1^,0  <  s  <  <}  since  the  sign  of  Xt  can  be  recovered  from  the  quadratic  variation 
of  Yf.  Therefore,  one  may  expect  that,  for  a  small,  the  variance  of  the  conditional  law  is 
small,  and  that  E{Xt/yt)  is  well  approximated  by  the  output  of  the  two  Kalman  filters 
corresponding  to  h{x)  =  h+x  and  h{x)  =  h_x. 

Fleming,  Ji,  Pardoux  [5]  show  that  from  the  outputs  of  the  two  Kalman  filters  cor¬ 
responding  to  h{x)  =  h+x  and  h{x)  =  h-x,  one  can  define  a  test  statistics,  in  order  to 
decide  the  sign  of  Xt  and  consequently  which  of  the  two  K^dman  filters  gives  currently 
a  good  estimate  of  Xf  Note  that  the  fact  that  one  of  a  bank  of  Kalman  filters  gives  a 
good  estimate  of  Xt  is  true  only  in  the  case  of  a  high  signal-to-noise  ratio.  Without  that 
hypothesis,  a  completely  different  approach  has  to  be  taken,  see  e.g.  Pardoux,  Savona  [14], 
Savona  [16]. 
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5.  Parameter  estimation  :  The  EM  algorithm 

We  consider  the  following  situation  : 

dXt  =be(Xt)  dt  +  dWt 

dYt  =h6{Xt)di  +  dWt 

with  independent  Wiener  processes,  and  we  assume  that  the  law  of  Xq  has  a  density  pg. 
The  problem  is  to  estimate  the  unknown  parameter  6,  on  the  basis  of  the  observation 
of  {Yi}.  It  can  be  shown  that  the  likelihood  fimction  for  this  problem  can  be  computed 
using  the  solution  of  the  corresponding  Zakai  equation.  An  alternative  approax:h,  the  EM 
algorithm,  has  been  considered  by  Dembo-Zeitouni  [4]  :  it  is  an  iterative  algorithm  where 
at  each  iteration,  a  new  auxiliary  function  of  the  parameter  is  computed  and  maximized. 

Campillo-Le  Gland  [2]  have  shown  that  the  computation  of  this  auxiliary  fimction 
involves  the  solution  of  a  nonlinear  smoothing  equation  and  also  some  recent  results  on 
stochastic  integration  with  anticipating  integrands  due  among  others  to  Nualaxt  and  Par- 
doux. 

Some  numerical  experiments  have  been  made  which  show  that,  whenever  some  noise 
intensities  in  the  system  are  small,  the  EM  algorithm  converges  very  slowly.  Time  dis¬ 
cretizations  of  the  stochastic  partial  differential  involved  have  been  proposed. 

B)  TRANSFER  TO  US 

E.  Pardoux  has  given  a  series  of  “distinguished  lectures”  at  the  Systems  Research 
Center,  Univ.  of  Maryland  on  the  applications  of  the  Malliavin  calculus,  and  A.  Bensoussan 
on  Nonlinear  filtering  2md  stochastic  control  with  partial  observation. 

F.  Campillo  has  presented  the  results  on  the  EM  edgorithm  at  the  IEEE  CDC  in  Los 
Angeles  (December  1987). 

E.  Pardoux  has  presented  some  results  on  nonlinear  filtering  with  high  signal-to-noise 
ratio  at  the  conference  in  the  honor  of  W.  Fleming,  at  Brown  University  (April  1988). 
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Abstract 


Two  algorithms  axe  compared  for  maximizing  the  likelihood  function  associated  with  param¬ 
eter  estimation  in  partially  observed  diffusion  processes 

•  the  EM  algorithm,  investigated  by  Dembo  and  Zeitouni  [2],  an  iterative  algorithm  where, 
at  each  iteration,  an  auxiliary  function  is  computed  and  maximized, 

•  the  direct  approach  where  the  likelihood  function  itself  is  computed  and  maximized. 

This  yields  to  a  comparison  of  nonlinear  smoothing  and  nonlinear  filtering  for  the  computa¬ 
tion  of  a  class  of  conditional  expectations  related  to  the  problem  of  estimation  (Section  3).  In 
particulsLT,  it  is  shown  that  smoothing  is  indeed  necessary  for  the  EM  algorithm  approach  to  be 
efficient. 

Time-discretization  schemes  for  the  stochastic  PDE’s  involved  in  the  algorithms  are  given, 
and  the  link  with  the  discrete-time  case  (hidden  Markov  model)  is  explored. 

Numerical  results  are  presented  (Section  6)  with  the  conclusion  that  direct  maximization 
should  be  prefered  whenever  some  noise  covariances  associated  with  the  parameters  to  be  esti¬ 
mated  are  small. 


Keywords:  parameter  estimation,  maximum  likelihood,  EM  algorithm,  diffusion  processes, 
nonlinear  fiUtring,  nonlinear  smoothing,  Skorokhod  integral,  time-discretization. 
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1  Introduction:  the  EM  algorithm 


The  EM  algorithm  is  an  iterative  algorithm  for  maximizing  a  likelihood  function,  in  a  context 
of  partial  information  [3].  Indeed,  let  {P0  :  9  6  6)  be  a  family  of  mutually  absolutely  continuous 
probability  measures  on  a  measurable  space  (O,^),  with  Pi  R  and  let  3^  C  be  the  (x-algebra 
containing  all  the  available  information.  Then,  the  log-likelihood  function  for  the  estimation  of 
the  parameter  9  can  be  defined  as 

i(d)^logE^(^|  y)  ,  (1) 

and  the  MLE  (maximum  likelihood  estimate)  as 

9  €  argmaxX(0)  . 

The  EM  algorithm  is  based  on  the  following  straightforward  application  of  Jensen’s  inequality 

L{9)  -  L{9')  =  logE^  1  y)  >  Es-  (log|^  |  y)  ^  Qi9,9')  ,  (2) 

which  gives,  for  each  value  9^  of  the  parameter,  a  global  minoration  of  the  log-likelihood  function 
9  I-*  L{9)  by  means  of  an  auxiliary  function  9  L{9')  +  Q{9,9'),  with  equality  a.t  9  =  9' .  The 

algorithm  iterations  are  described  by  the  following  steps 

1.  p  =  0,  initial  guess  9q, 

2.  set  9’  = 

3.  (E-step)  compute  Q(-,^'), 

4.  (M-step)  find  such  that  Q(9p^i,9')  >  Q{9,9')  for  all  ^  €  0, 

5.  if  a  stopping  test  is  satisfied, 
then  set  final  estimate  =  ^p+i. 

else  repeat  from  step  2  with  p  =  p+l. 

An  interesting  feature  of  the  algorithm  is  that  it  generates  a  maximizing  sequence  {0^,  i  p  = 
0, 1,  ■  ■  ■}  in  the  sense  that  L{9p+i)  >  L{9p)  unless  9j+i  =  9p.  Some  general  convergence  results 
about  the  sequences  {L(9p)  :  p  =  0, 1,---}  wd  {9p  :  p  =  0, 1,---}  are  proved  in  [13],  under 
mild  regularity  assumptions  on  L(-)  and  -  see  also  [2,  Theorem  2].  To  prove  the  existence 
of  smooth  enough  -  in  the  a.s.  sense  -  versions  of  9  >-*  L(9)  and  (9,9*)  Q(9,9'),  as  well  as  to 

get  the  expression  of  the  corresponding  derivatives,  one  can  rely  e.g.  on  [12,  Lemma  Ij. 

To  decide  whether  this  algorithm  is  interesting  from  a  computational  point  of  view,  the 
following  three  questions  should  be  answered 

[E]  how  expensive  is  the  computation  of  the  auxiliary  function  Q(-,  9')  ? 

[M]  how  easy  is  the  maximization  of  the  auxiliary  function  Q{-,9')  ? 

[EM]  how  fast  is  the  convergence  of  this  sub-optimal  iterative  algorithm  towards  the 
MLE  ? 


In  [2],  the  EM  algorithm  has  been  applied  in  the  context  of  continuous-time  partially  ob¬ 
served  stochastic  processes.  In  the  particular  case  of  diffusion  processes,  the  genersJ  expression 
of  Q{6,  S')  has  been  derived  and  said  to  Involve  a  nonlinear  smoothing  problem.  The  purpose  of 
this  work  is  to  address  the  following  three  points 

•  discuss  the  expression  in  [2]  giving  Q{S,0')  in  terms  of  a  nonlinear  smoothing  problem  - 
this  will  involve  generalized  stochastic  calculus  (Skorokhod  integral). 

•  get  an  equivalent  expression,  in  terms  of  a  nonlinear  filtering  problem,  for  Q{S,6')  and  its 
gradient  V^'^Q(S,  S')  -  it  will  turn  out  that  smoothing  is  indeed  necessary  for  the  point 
[M]  introduced  above  to  be  satisfied,  although  filtering  is  enough  to  compute  Q(S,  S')  for 
a  given  padr  (Sis'). 

•  get  similar  expressions  for  the  original  log-likelihood  function  L(S)  and  its  gradient  VL(S). 

This  will  allow  to  compare,  from  a  computational  point  of  view,  the  two  possible  methods  for 
maximum  likelihood  estimation 

•  direct  maximization  of  the  likelihood  function  [4], 

•  the  EM  algorithm. 

In  particular,  the  point  [M]  will  receive  a  positive  answer,  which  is  indeed  the  main  motiva¬ 
tion  for  the  EM  algorithm.  On  the  other  hand,  it  will  be  proved  that  computing  the  axixiliary 
function  Q{',0')  is  a  more  heavy  task  than  computing  the  original  log-likelihood  function  £(•). 

As  for  the  point  [EM],  numerical  examples  will  show  that  the  convergence  of  the  EM  algorithm 
may  be  very  slow.  This  typically  occurs  in  those  cases  where,  for  each  ^60  the  function 
L(S')  +  Q(-iS')  is  very  sharp  below  the  log-likelihood  function  L(-).  In  such  cases  indeed,  maxi¬ 
mizing  the  auxiliary  function  does  not  allow  to  update  significantly  enough  the  current  estimate 
at  each  M-step. 

The  statistical  model  is  presented  in  Section  2,  where  expressions  sure  given  for  L(S),  VL(S), 

Q(S,S')  and  V^  °Q(S,S')  in  terms  of  conditional  expectations.  It  turns  out  that  the  last  three 
expressions  all  belong  to  a  certain  class  of  conditional  expectations.  Two  methods  are  then 
proposed  in  Section  3  for  the  computation  of  conditional  expectations  in  this  class  -  one  based 
on  nonlinear  filtering,  the  other  on  nonlinear  smoothing  and  involving  generalized  stochastic 
calculus  (Skorokhod  integral).  These  results  are  applied  in  Section  4  to  the  computation  of 
L{6),  VL(6),  Q{SiS')  and  V^'^Q(0,S')  in  terms  of  nonlinear  filtering  and  nonlinear  smoothing 
conditional  densities.  Section  5  is  devoted  to  the  time-discretization  of  the  stochastic  PDE’s  • 

introduced  in  Section  4,  and  the  link  with  MLE  of  parameters  in  partially  observed  Markov 
chains  (hidden  Markov  models)  is  explored.  A  numerical  example  is  presented  in  Section  6,  smd 
the  influence  of  noise  covariances  is  investigated. 
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Statistical  model 


In  this  section,  expressions  for  the  log-likelihood  function  L(-)  and  the  auxiliary  function 
will  be  derived  in  the  following  context  [2,  Section  3]. 

Suppose  that  on  a  measurable  space  (O,^)  are  given 

•  a  family  {P$  :  0  €  6)  of  probability  measures, 

•  a  pair  of  stochastic  processes  {Xt  :  f  >  0)  and  (Vt  :  t  >  0)  taking  values  in  R™  and  R** 
respectively, 

such  that  under  P$ 

dXt  =  ht{Xt)  dt  +  a{Xt)  dWt  ,  Xo  ~  pg(-)  ,  (3) 

dYt  =  he{Xt)dt  +  dWt  , 

where  (Wf  :  t  >  0)  and  (Wf  :  t  >  0)  are  independent  Wiener  processes,  with  covariance  matrix 
I  and  r  respectively,  and  the  pair  is  independent  from  the  r.v.  Xq- 

The  following  hypotheses  are  made 

{Hi)  a{-)  is  a  continuous  and  bounded  function  on  R™  such  that  a(-)  =  (^<^*{')  is  a 
uniformly  elliptic  m  xm  matrix,  i.e.  a(-)  >  al, 

for  all  6  0  open  subset  of  R^  (the  set  of  parameters) 

{H2)  Po(')  ®  density  on  R”, 

(Hs)  be(-)  is  a  measurable  and  bounded  function  from  R™  to  R"*, 

{H4)  hj(  )  is  a  measurable  and  bounded  function  from  R”*  to  R**, 

and  in  addition 

{Hs)  the  probability  measures  on  R”*  with  densities  {pq{‘)  :  6  0)  are  mutually 

absolutely  continuous. 

Moreover,  it  is  assumed  that  Po(-),  bt{-)  and  />«(■)  are  continuously  differentiable  with  respect 
to  the  parameter  $  and  that,  for  all  0  €  0  the  derivatives  Vbs(-)  and  Vhe(-)  are  measurable  and 
bounded  functions  from  R"*  to  R"*  and  R^  respectively  (throughout  this  paper,  V  will  denote 
the  derivation  with  respect  to  the  parameter  S). 

The  existence  and  uniqueness  of  a  weak  solution  to  the  stochastic  differential  equation  (3) 
follows  from  hypotheses  {Hi  —  H3).  If  moreover  hypotheses  (R4  —  H3)  hold,  then  for  all  T  >  0, 
{Pf  :  ^  €  0)  when  restricted  to  [0,  T\  are  mutually  absolutely  continuous  probability  measures 
on  (n,^  with  Radon-Nikod}rm  derivative 


=  4{Xo)  •  exp  (  rMX,)  ~  6j.(X,)]*o-‘(X.)a(X,)  dwf  (4) 

Po 

-I  -  bt>iX,)ra-HX,)lb,iX,)  -  be.(X,)]  d*  J 

exp  I  -  Vrai’r-' dWf 

J^MX,)  -  h^{X,)]*r-^[hg{X,)  -  he.{X,)]  d^j  . 

Consider  also  the  probability  measure  pj  defined  on  (ft,  by 

Z*  ^  ^  =  exp  I h;(X,)r-^  dr.  -  ^  h;(X,)r-%(X,)  ds J  . 

so  that,  under  Pj 

dXt  =  bt(Xt)  dt  +  a(Xt)  dW!  ,  Xo  -  pg(-)  , 

where  {W*  '•  t  >  0)  and  {Y*  :  t  >  0)  are  independent  Wiener  processes,  with  covariance 
matrix  /  and  r  respectively,  and  the  pair  is  independent  from  the  r.v.  Xq.  The  Radon-Nikodym 
derivative  Ass>  can  then  be  decomposed  as 


Ass*  =  A 


with  aL  = 

^  dPl 


It  is  assumed  that  only  (VJ  :  0  <  t  <  T)  is  observed.  Let  (yt  •  0  <  t  <  T)  denote 
the  associated  filtration.  The  likelihood  function  for  the  estimation  of  the  parameter  0  can  be 
expressed  as 

E»  ^^iyT)=EUZ*AijyT) 

with  the  particular  choice  R  =  P^  {a  fixed  in  0)  in  (1).  By  Bayes  formula 

eU^a;„  1  yr)  =  e;(z*  1  yr)  •  eUa;„  |  yj)  =  e»(z*  |  yj) 

since  A^^  is  independent  of  yr  under  PJ[.  This  gives  the  following  expression  for  the  log- 
likelihood  function  L{-) 

I(ff)  =  logE»(Z'|yT).  (5) 


For  the  auxiliary  function  Q(-,  •)  defined  by  (2),  one  has  immediately 


Q(tf,0  =  iV(A''''|yT)  = 


_  E<,(A*’»'Z^  I  yr) 


(6) 


where 


A^-^'  t  logAs,®. 


T 


(7) 


=  log^(Xo)  +  /  [b«{X,)-b9.{X,)ra-\X,)a{X,)dWf 
Po  JO 

-I  /  MXs)  -  b0.ix,)ya-\x,)Mx,)  -  b<,{x,)]  ds 

+  /  [AjTO-VTOrr-'dWt 

Jo  J, 


Under  additional  regularity  asumptions  on  the  datapQ(-),  6«(-)  and  it  is  easy  to  prove, 
using  results  in  [12],  that  both  6  *-*  L(0)  and  6  t-*  Q{6,6')  have  a.s.  differentiable  versions,  with 
gradients  given  by 

-  -p  (\»  \  Vt')  -  I  yr) 

V  w.(f)  -  E,.(A  I  yr)  -  .  (9) 

respectively,  where 

A*  =  Vl-°logAtf.#,  =  V^’°A*'®' 

=  ^(Xo)+  /  [Vij(X.)ra-HX,MA-,)dW/+  /  [Vh^(A-,)]*r-i  dWt  (10) 

Po  Jo  Jo 

is  independent  of  6^. 


Remark.  One  can  check  from  (8)  and  (9)  that 


V‘-‘»g(9,9')  |o=s.=  VI(9')  , 


as  expected. 


In  the  next  section,  two  different  methods  will  be  given  -  by  means  of  stochastic  PDE’s 
-  to  compute  the  various  quantities  introduced  so  far:  L{9),  Vi(9),  Q{9,6')  and  V^’°Q(9,9'). 
This  will  make  possible  the  numerical  implementation  of  algorithms  for  the  maximization  of  the 
likelihood  function. 
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3  Smoothing  vs.  filtering  for  the  computation  of  a  class  of 
conditional  expectations 

For  the  sake  of  simplicity,  any  reference  to  the  parameter  0  will  be  dropped  throughout  this 
section.  In  particular,  P  will  denote  the  probability  measure  under  which 

dXt  =  b(Xt)  dt  +  <T{Xt)  dWt,  Xo  ~  po(-)  . 
dYt  =  hiXi)dt  +  dWt  . 

where  {Wt  :  0  <  t  <  T)  and  (Wt  :  0  <  t  <  T)  are  independent  Wiener  processes,  with 
covariance  matrix  I  and  r  respectively,  and  the  pair  is  independent  from  the  r.v.  Xq,  whereas 
under 

dXt  =  b{Xt)  dt  +  (T{Xt)  dWt  ,  Xo  ~  po(-)  , 

where  {Wt  :  0  <  t  <  T)  and  {Yt  :  0  <  f  <  T)  are  independent  Wiener  processes,  with  covariance 
matrix  I  and  r  respectively,  and  the  pair  is  independent  from  the  r.v.  Xq.  Therefore  P  =  Zt-P\ 
where  the  process  {Zt  ;  0  <  f  <  T)  is  defined  by 

Xt  =  exp  I  f  dY,  -  \  f  h*{X,)r'^h{X,)  ds)  . 


The  purpose  of  this  section  is  to  provide  two  different  methods  -  one  based  on  nonlinear  filter- 
ing,  the  other  on  nonlinear  smoothing  -  for  the  computation  of  the  following  class  of  conditional 
expectations 

+  /  v\x,)dW,  +  J^  x*(A:xx.)dw,|yTj  .  (11) 

where  0,  rj  and  x  are  measurable  and  bounded  functions  from  R"*  to  R,  R,  R**  and  R"* 
respectively.  It  is  readily  seen  from  (6-10)  that  the  computation  of  either  VL{0),  Q{0,0')  or 
V^’°Q(fl,6')  involves  such  conditional  expectations. 

It  is  clear  from  the  definition  that  A  depends  linearly  on  {0,  t},  x)-  It  will  turn  out  that  non¬ 

linear  smoothing  is  the  only  way  to  make  this  dependence  explicit,  although  nonlinear  filtering 
-  which  is  simpler  -  is  enough  to  just  compute  A. 

Rewriting  A  as 

A  =  E03(Xo)  I  yr)  +  mm  - ri*{X,)h{X,)  |  3^)*  +  E  y\*{X,\iY,  \ 

+E  x*m<r{X,)  dW.  I  3^  j  ,  (12) 

one  would  like  to  interchange  conditional  expectation  and  stochastic  integral  in  the  third  term 
of  (12).  However,  the  resulting  expression 

-  rE{r,*{X,)\yr)dY,”  (13) 

JO 
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is  not  an  ltd  integral,  since  the  integrand  is  obviously  not  adapted  to  the  filtration  (34  ^  0  <  t  < 
T),  and  needs  to  be  given  a  rigorous  meaning.  Although  the  natural  generalization  of  ltd  integral 
that  allows  anticipating  integrands  is  Skorokhod  integral,  it  will  be  proved  in  Proposition  3.3 
below  that  the  correct  statement  is 

E  1  =  E  o  dY,  1  yr 

=  r  1  yr)  o  dY,  jk  f  E{v*{X,)  1  yr)  dY,  , 

JO  Jo 

where  the  non-adapted  stochastic  integrals  are  respectively  a  generalized  Stratonovitch  integral 
and  a  Skorokhod  integral  [6]. 

In  addition,  there  seems  to  be  no  computable  expression  available  for  the  last  term  of  (12). 
However,  in  the  particular  case  where  x  derives  &om  a  scalar  potential  function,  one  has  the 
following 

Proposition  3.1.  Assume  there  exists  a  scalar  function  V  €  Cj(R"*)  such  that  x  —  DU. 
Then 

E^j\*{X,)a{X,)dW,\yT^  = 

E{U(Xt)  I  yr)  -  EmXo)  1  yr)  -  f  E{CU{X,)  \  yr)  ds  ,  (14) 

Jo 

whose  proof  follows  immediately  from  ltd’s  lemma. 

At  this  point,  it  is  necessary  to  introduce  some  notations  and  definitions  related  to  nonlinear 
filtering  and  smoothing. 


Notations  and  definitions 
•  Filtering 

■Kt  (resp.  pt)  will  denote  the  normalized  (resp.  unnormalized)  conditional  density  of  the  r.v.  Xt 
given  34,  i.e. 

=  Ei<t>iXt)  I  34)  ,  {JH,<I>)  =  EH4>{Xt)Zt  1  yt) 

for  any  test-function  <f>.  By  Bayes  formula 


(Pt,l) 


(15) 

(16) 


The  equation  for  (pt  :  0  <  t  <  T)  is  Zakal  equation  [8] 

dpt  =  £>  it  +  h*ptr-^  dYt  ,  (17) 

where  C*  denotes  the  adjoint  operator  of  the  infinitesimal  generator  C  of  the  diffusion  process 
(At  :  0<t  <  T),  defined  by 

#H  q3  ^  A 
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•  Smoothing  (fixed-interval) 


Let  T  >  0  denote  the  fixed  end-time,  pt  (reap,  ^e)  denote  the  normalized  (reap,  unnor- 
malized)  conditional  density  of  the  r.v.  Xt  pven  yj,  i.e. 


Again 


(ge.l)  ’ 


Introducing  the  backward  Zakal  equation 

dvt  +  Cvt  dt  +  h*vtT~^  dYt  =  0  ,  vj*  =  1  , 
one  has  [8,9]  that  (pt,  vt)  is  independent  of  t,  and  qt  =  ptvt  is  differentiable  with 

^■\-ptCvt=  Vi  C*pt  . 


Note  that 

(9t,l)  =  (pT,l)  ,  0<t<T. 


(18) 


(19) 

(20) 
(21) 


3.1  Filtering  approach 


Define 


\i=0{Xo)^  f\(X,)ds+  f\*{X,)dW,+  fx*{X,MX,)dW. 
Jo  Jo  Jo 

so  that,  by  Bayes  formula 


A  =  E(At  I  ^r)  = 


1  yx) 
ekzt  I  yx) 


A  first  method  would  be  to  compute  the  joint  conditional  law  of  {Xx,Xt)  given  yr,  and  then 
integrate  over  the  first  variable  to  get  the  mar^nal  conditional  law  of  Xt  given  yr-  An  alternative 
method  is  to  find  an  equation  for  (wt  :  0  <  t  <  T)  defined  by 


(u;t,<^)^Et(^X0A,Zt  j^t). 

Indeed,  by  Ito’s  lemma 


d[4iXi)XtZt]  =  XtZtC<l>{Xt)dt-\-XiZt{D<h{Xt)y<r{Xt)dWt 

+iKXt)  Zt  i{Xt)  dt  +  0(X,)  Zt  v*(Xt)  Mi  +  <l>{Xt)  Zt  x*iXt)<T{Xt)  dWt 
+^(X,)  A,  Zt  h*{Xt)  r-*  dYt  +  <KXt)  V*{Xt)  h{Xt)  Zt  dt 
+ZtxiXtraiXt)D<p{Xt)dt. 
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Using  properties  of  conditional  expectation  given  the  observation  ^—algebra  under  the  reference 
probability  measure  and  the  definition  (15),  gives 

where  ^  »  so  that  {wt  ‘  0<t<T)  solves 

dwt  =  C*wt  dt  +  h*wtr~^  dYt  +^ptdt  +  rfpt  dYt  +  J*{x)Pt  dt  ,  too  =  PPo  •  (22) 


Theorem  3.2.  Let  {pt  •  0  <t  <T)  and  (wt  :  0  <t  <T)  be  the  unique  solution  of  (17)  and 
(22)  respectively.  Then,  the  following  expression  holds  for  A  defined  in  (11) 


.  .  (^>T.l) 
(PT.l)  • 


(23) 


This  expression  is  actually  computable.  Unfortunately,  the  linear  dependence  of  (wt,1)  on 
(/9,(,Tf,x)  is  not  made  explicit,  which  should  be  the  case  for  the  point  [M]  introduced  in  the 
Introduction  to  be  satisfied.  Therefore,  the  next  step  wiU  be  to  make  this  dependence  more 
explicit.  This  will  involve  nonlinear  smoothing  and  generalised  stochastic  calculus  (Skorokhod 
integral).  Actually 

•  the  stochastic  integral  in  (13)  wiU  be  given  a  rigorous  meaning, 

•  the  last  term  in  (12)  wUl  also  be  given  a  computable  expression,  whether  or  not  x  derives 
from  a  scalar  potential  function. 

3.2  Smoothing  approach 

The  idea  here  is  to  compute  the  stochastic  differential  of  the  scalu  product  (ivt,Vt),  where 
(vt  :  0  <  t  <  T)  is  the  solution  of  the  backward  Zakal  equation  (19).  Since  (22)  is  a  forward 
stochastic  PDE  and  (19)  is  a  backward  stochastic  PDE,  one  must  use  the  two-sided  stochastic 
calculus  introduced  in  [10,11].  This  gives 

d(tot,Vt)  =  (Cwi,vt)dt  +  {h*wt,vt)r~^  dYt 

H^Pt,  Vt)  dt  -f  {7)*pt,vt)  dYt  +  ( J*(x)j>t. * 

-(wt,Cvt)  dt  -  (Ti>e,  h*vt)r~^  dYt 

=  +  (*>»?*) +  (p*.x*a -Pvt) • 

Integrating  from  0  to  T  gives 

(wt,!)  =  (9o,)9)+  [  {q„Ods+  f  (q„r)*)dY,+  f  {p„x*aDvt)d8  , 

Jo  Jo  Jo 

where  the  stochastic  integral  is  a  two-sided  stochastic  integral. 

Using  (21)  gives  an  expression  for  A  in  terms  of  normalized  conditional  densities 

A  =  (,Po,^)  +  r{Ps,0  df  +  A’  +  A". 

Jo 
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•  Study  of  A' 


.  f{q„ri*)dY, 

At  ^  Jo _ 

(ptA) 

One  has 


(24) 


E(A')  =  E»(Zt^')  =  E‘(E*(-^t  I  yT)A') 

=  =  0, 

where  the  last  equality  follows  from  results  on  two-sided  stochastic  integrals.  This  was  expected, 
since 

yl'  =  E 


ii: 


v*iX,)(fW,\yT 


Expressions  in  terms  of  normalized  conditional  densities  are  given  by  the  following 


Proposition  3.3.  Let  (pt  :  0  <  t  <  T)  denote  the  normalized  smooothing  density.  Then 


A'  =  ripA,vldY,-  Tip,, T)*)ip,,h)ds 
Jo  Jo 

=  f  {PA.i‘)odY,-  r{p,Xh)da, 
Jo  Jo 


where  the  non-adapted  stochastic  integrals  are  respectively  a  Skorokhod  integral  and  a  generalized 
Stratonovitch  integral  [6]. 


Proof.  The  idea  is  to  get  the  denominator  F  =  1/(pt,  1)  inside  the  stochastic  integral  in  (24). 

Let  first  D.  denote,  on  the  probability  space  (f2,^,P^),  the  derivative  with  respect  to  the 
d-dimensional  Wiener  process  (Ij  :  0  <  t  <  T)  in  the  direction  of  the  vector  space  ^r'(0,  T\  R**). 
Since  the  two-sided  integral  is  a  particular  case  of  the  Skorokhod  integral,  it  follows  from  [6, 
Proposition  3.2]  that 


A'  =  F  r{q„ri*)dY, 

Jo 

=  r Fiq„r,*)dY,+  f  {q„r,*)  D,F ds 

i/O  </0  9 


-r 


(9..»7*) 


dY, 


-f: 


(g.,1)  Jo  (g„l)’ 

where  the  stochastic  integral  is  a  Skorokhod  integral. 


D,{pT,  1)  ds  , 


For  s  fixed  in  [0,T],  consider  the  d-dimensional  random  process  (at  ;  0  <  t  <  7)  defined  by 


Zf  =  D,pt.  Clearly  zt  =  0  for  0  <  t  <  s.  For  *  =  1,  •  • 
unique  solution  of  the  forward  stochastic  PDE  [7] 

dzi  =  C*4dt  +  hrzir-^dYt  , 


• ,  d,  the  process  (Zf  :  s  <  t  <  7)  is  the 
z\t=h'p,. 
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Introducing  the  solution  (vt  :  0  <  t  <  T)  of  the  backward  Zakal  equation  (19)  and  using 
again  the  two-sided  stochastic  calculus  gives  d(Z(,Ut)  =  0  for  s  <  t  <  T.  Therefore 


(i'T,!)  =  =  {qs,h)  , 


so  that 


x'= 

Jo  JQ  (9.,1)  (9»,1) 

=  r^Pa^ffldY,- f  {p,,rr){Pa.h)d8. 
Jo  Jo 


To  get  the  second  expression,  consider  the  d-dimensional  random  process  (ut  =  0  <  t  <  T) 

defined  by  ut  =  (p(,q).  The  Skorokhod-Stratonovitch  transformation  for  generalized  stochastic 
integrals  gives  [6,  Theorem  7.3] 

u:  dy.  =  dY,  -  ^  l\Dtus  +  D;ii.)  ds  , 


where 


It  turns  out  that 


«,  =  Hm  ^  D'X  ,  =  Urn  D\u\  . 

ial  '*  i=l 


o;«;  = 

(9t,l)  (*,1)^ 

Next  D,qt  =  {DtPt)vt  +pt  {D,vt).  In  particular  D,pt  has  already  been  studied,  and  a  similar 
argument  for  D^vt  shows  that  D*q,  ~  D~qa  =  hq,.  Therefore 

n+«  -  n-«  - 

=  {P;V*h)  -  {p,rf){p„h)  . 

This  finaly  gives 

A!  =  C {p.,v*)  o  dY,  -  /^(p„T,*h)ds  , 

Jo  Jo  • 

where  the  stochastic  integral  is  now  a  generalized  Stratonovitch  integral.  □ 


Renuo’k.  In  terms  of  conditional  expectations 


A'  =  r  I  yT)dY,  -  r  W{X,)  I  yr)E(h(x,)  I  yr)ds 

Jo  Jo 

=  rEirfm  I  yr)  o  dY,  -  r  B{r,*{X,)h{X,)  |  3V)ds  . 

Jo  Jo 


•  Study  of  A" 


.  /  (p,,X*aDv,)d8 
11  £  Jo _ 

(PT.l) 


(25) 


Oae  has 


E(A")  =  E<(Zt  A")  =  E^(EU^t  I 

=  E»  y\„x*<iDv,)ds^  =  j\E^{p,),x*aE\Dv,))ds  , 

where  the  last  equality  follows  from  the  iadependeuce  ofp«  and  Vj  under  the  probability  measure 
PK  Now  E*(Z?w,)  =  DE^(v,)  s  0  since  Ef(w,)  =  1.  Therefore  E(A")  =  0,  which  was  expected 
since 


A"  =  E  x*iX,)<T{X,)  dW,  I  yr)  . 


The  identities 


give  the  following  two  other  expressions  for  A",  in  terms  of  normalized  conditional  densities 

In  the  particular  case  where  x  derives  from  a  scalar  potential  function,  it  can  be  checked  that 
(25)  reduces  to  the  expression  (14)  given  in  Proposition  3.1.  Indeed 


Proposition  3.4.  Assume  there  exists  a  scalar  function  U  £  Cj(R"’)  such  that  x  —  . 

Then  ^ 

A"  =  (PT,  U)  -  (po,  U)  -  f  (p.,  CU)  d,  . 

Jo 

Proof.  It  follows  from  the  identity  C{Uv,)  ~  UCv,  4-  v,CU  +  x*^T)Vf,  and  from  (20  that 

(p„X*ai>v,)  s=  {p„C.{Uv,))-{p„UCv,)~{jp„v,CU) 

=  (v.O*  -  PsCv,,  U)  ~  (p,v„  CU)  =  (g„  U)  -  (g„  CU)  . 

Integrating  from  0  to  T  gives 

r(p.,  X*o  Dv,)  d^^  {<rr,U)-  (go,  U)  -  ^(g.,  CU)  ds  . 

Jo  Jo 

Dividing  by  (pr,  1)  and  using  (21)  finishes  the  proof.  □ 


Remark.  In  terms  of  conditional  expectations 

A”  =  e{U{Xt)  I  yj)  -  m{Xo)  \yT)-  r  ^{cu{x,)  \  yr)  ds , 

Jo 

which  is  exactly  (14). 
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The  following  theorem  has  been  proved 


Theorem  3,5.  Let  (irt  :  0  <  t  <  T)  and  (pt  :  0  <  t  <  T)  he  the  normalized  filtering  and 
smoothing  density  (e.g.  obtained  from  the  unique  solution  {pt  :  0  <  t  <  T)  and  (vj  :  0  <t  <T) 
of  (17)  and  (19)  respectively).  Then,  the  following  two  expressions  hold  for  A  defined  in  (11) 


A  =  +  {Ps,X*aD  (log  da  +  A'  , 

T  T 

n*)  (^«.  V*)  (/>*.  . 

/I'  =  .  °  ° 

/  (Pt,v*)  o  dYj-  f  {pg,7)*h)ds  , 

Jo  Jo 


where  the  non-adapted  stochastic  integrals  are  respectively  a  Skorokhod  integral  and  a  generalized 
Stratonovitch  integral  [6]. 


Conclusion 

The  advantage  of  smoothing  over  filtering  is  that  the  linear  dependence  on  [fi,  t},  x)  is 
made  explicit:  provided  the  underlying  probability  measure  does  not  change,  evaluating  A  for 
a  different  set  of  data  {fi,^,T),x)  will  not  require  the  computation  of  a  new  infinite-dimensional 
conditional  density.  In  the  filtering  approach,  one  would  have  to  solve  another  stochastic  PDE, 
with  a  different  “right-hand  side” . 

On  the  other  hand,  from  the  computational  point  of  view,  solving  the  equation  for  the 
smoothing  density  requires  not  only  the  computation  but  also  the  storage  of  the  filtering  density, 
and  is  therefore  more  expensive.  Moreover,  in  the  filtering  approach  it  is  enough  to  integrate  the 
unnormalized  filtering  density  at  final  time  T,  whereas  in  the  smoothing  approach  one  has  (i) 
at  each  time  t,  to  integrate  some  functions  involving  ((,  r],  x)  against  the  normalized  smoothing 
density,  and  (ii)  to  integrate  the  resulting  processes  over  the  interval  [0,r]. 

The  next  section  will  be  devoted  to  applying  these  two  approaches  to  the  computation  of 
quantities  related  to  the  direct  likelihood  function  maximization  and  the  EM  algorithm. 


4  Application  to  the  MLE  problem 

4.1  Direct  maximization  of  the  likelihood  function 


It  follows  from  (5)  that  the  log-likelihood  function  Hfi)  can  be  expressed  as 


with  -  see  (17) 


X(«)  =  log(p?.,l) 


dpi  =  CIpI  dt  +  hgpf  r  ^  dYt 


ij=i  •  J  t=i  ' 

It  follows  from  (8)  and  (10)  that  V £(6)  belongs  to  the  class  of  conditioned  expectations 
considered  in  Section  3.  The  approach  based  on  filtering  (Theorem  3.2)  gives 

with  (pI  :  0  <t  <T)  and  {wf  :  0  <  t  <  T)  given  respectively  by  (26)  and  -  see  (22) 


dwl  =  Cgwl  dt  +  h^wlr-^  dYt  +  (Vhe]*pfr-^  dYt  +  JePt  dt  ,  wS  =  Vpg  ,  (27) 

where  Jeep  =  [V6s]*D(^  . 

Remark.  Equation  (27)  is  exactly  what  would  be  obtained  by  deriving  formally  equation  (26) 
with  respect  to  the  parameter  9.  This  result  was  indeed  obtained  in  [4],  relying  on  the  existence  of 
a  “robust”  (i.e.  continuous  with  respect  to  observation  sample  paths)  version  of  Zakal  equation. 

If  0  is  a  p-dimensional  parameter,  then  the  gradient  {wl  :  0  <  t  <  T)  is  a  p-dimensional 
vector;  each  component  of  this  vector  actually  solves  a  stochastic  PDE  which  is  coupled  only 
with  (pf  :  0  <  t  <  T)  and  with  no  other  component;  moreover  the  coupling  occurs  only  through 
the  “right-hand  side”  and  each  of  these  (p  +  1)  stochastic  PDE's  has  the  same  dynamics.  In 
other  words,  one  has  to  solve  the  same  stochastic  PDE  with  (p  + 1)  different  “right-hand  side" . 
Note  that  smoothing  could  provide  a  more  efficient  method  to  deal  with  such  a  problem. 


4.2  The  EM  algorithm 

It  follows  from  (6)  and  (7)  that  the  auxiliary  function  Q{9,9')  belongs  to  the  class  of  con¬ 
ditional  expectations  considered  in  Section  3.  The  approach  based  on  filtering  (Theorem  3.2) 
gives 
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with  {pt  :  0  <  t  <  T)  and  (tuf**  :  0  <  t  <  T)  given  respectively  by  (26)  and  -  see  (22) 

dwf  =  dt  +  hTg.wfr-'^  dYt  +  [A#  -  V]Vf  dYt  +  dt 

-5  (li**  -  -  V]  +  [^  -  he'Yr-'^[he  -  A^]) pf  dt , 

_  -S'  Po 

^0  —  Po  a>  » 

Po 

where  Jee'<i>  =  [6tf  -  be‘]*D<f> . 

On  the  other  hand,  smoothing  (Theorem  3.5)  gives 


QM  = 


A'  = 


iptM  4)  +  riP^.Abe  -  be-V  D  (log  )  ds 
Po  Jo  \  ^»  / 

~5  f  (pf '  [*«  “  bgi]*a~^[bs  —  6j>]  +  [A®  -  As<]*r~^[As  -  Aj*])  ds  +  A'  , 

JO 

’  r(pf,[As-AH>-'dn-  r(pf,[^s-/iS']*K^(pf',V)d5, 

Jo  Jo 

* 

riP^JAhe  -  he.Y)r-^  o  dYs  -  ^(pl.lAs  -  A«,]*r''A(,)ds  , 

Jo  Jo 


(28) 

(29) 


where  (irf  :  0  <  t  <  T)  and  (pf  :  0  <  t  <  T)  are  the  normalized  density  of  filtering  and 
smoothing,  computed  from  the  unique  solution  (pf’  :  0  <  t  <  T)  and  (vf*  :  0  <  t  <  T)  of  (26) 
and  -  see  (19) 

dvf  +  Cg'vf  dt  +  hgivf  r~^  dYt  =  0  ,  Vj  =  1  ,  (30) 

respectively.  Moreover,  the  non-adapted  stochastic  integrals  in  (29)  are  respectively  a  Skorokhod 
integral  and  a  generalized  Stratonovitch  integral  [6]. 


Remark.  It  is  now  possible  to  give  a  more  precise  meaning  to  the  (E-step)  and  (M-step)  of 
the  algorithm.  Indeed,  9'  being  fixed 

3.  (E-step)  compute  the  normalized  smoothing  density  (pf'  :  0  <  t  <  T)  -  this  requires  in 
particular  to  compute  the  normalized  filtering  density  (Trf  :  0  <  t  <  T), 

4.  (M-step)  maximize  Q(-,^)  -  where  for  each  0  £  &  the  computation  of  requires 

according  to  (28)  {ij  at  each  time  t,  to  integrate  some  functions  depending  on  (P,  P')  against 
the  normalized  smoq^hing  density  ,  and  fiij  to  integrate  the  resulting  processes  over  the 
interval  [0,r]. 


Remark.  A  partial  answer  can  be  given  to  the  question  [M]  raised  in  the  Introduction.  Indeed 

•  the  differentiability  of  P  Q(P,  P')  relies  in  an  obvious  way  on  the  existence  of  derivatives 

with  respect  to  P  of  Po(’)>  ^(')t 

•  computing  the  corresponding  derivatives,  and  maximizing  P  y-*  Q(P,  P')  will  not  involve  the 
computation  of  any  other  infinite-dimensional  conditional  density. 
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Moreover,  as  was  pointed  out  in  [2],  there  are  particular  cases  in  which  the  M-step  can  be  dealt 
with  explicitely.  This  includes  the  case  where 

•  logpo(-)  depends  quadratically  on 

•  &«(•)  and  h«{’)  depend  linearly  on  9, 

since  $  Q{8,  S')  becomes  then  a  quadratic  form. 


It  follows  from  (9)  and  (10)  that  V^'°Q(6,$')  belongs  to  the  class  of  conditional  expectations 
considered  in  Section  3-  The  approach  based  on  filtering  (Theorem  3.2)  gives 

(Pr»l) 

with  (pf  ;  0  <  t  <  T)  and  (tsf**  :  0  <  t  <  T)  given  respectively  by  (26)  and  -  see  (22) 

dwf  =  C*e.wf^  dt  +  hlwfr-^  dYt  +  lVhs]*pf  dYt  +  J/pf  dt 
-  ([V6s]*o-i[6s  -  bg.]  +  * 

,J8'  _  PO 

^0  —  ~J^Po  J 


Pb 


where  Je<(>  =  [V6s]*D(^ 


Remark.  Comparing  with  (27),  one  can  check  once  again  that 

V‘’°Q(fi,e')  VL(9')  , 

as  expected. 


As  for  the  smoothing  approach,  one  can  use  again  the  results  of  Section  3.  Alternatively, 
one  can  directly  dififerentiate  with  respect  to  9  the  expression  (28)  for  Q(9,9'),  thus  illustrating 
the  point  [M].  Indeed 


V‘-“(tf,^)  = 


A'  = 


-  r(pf .  (V6,J*a-M6s  -  bv]  +  [Vhs]*r-'[As  -  *«.])  ds  +  A'  , 
Jo 

7*  aT 

/  {ptA^hery-'dY,-  (pr,[Vhsr)r-^(pf,V)d«, 

JO  Jo 

T  f'T 

/  odY,-  (pf, IV/is]*)r-' V)ds  , 

Jo  Jo 


where  the  non-adapted  stochastic  integrals  are  respectively  a  Skorokhod  integral  and  a  gener¬ 
alized  Stratonovitch  integral  [6]. 
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5  Time-discretization,  and  relation  with  MLE  of  parameters 
in  partially  observed  Markov  chains 

Before  turning  to  the  presentation  of  the  numerical  results,  it  is  worth  describing  the  ap¬ 
proach  that  has  been  adopted  to  actually  compute  the  expressions  obtained  for  VL(6)  and 
Q(fi,  6').  From  the  results  of  the  previous  section,  this  should  reduce  in  some  sense  to  discretizing 
stochastic  PDF’s  (26),  (27)  and  (30). 

However,  instead  of  discretizing  separately  these  stochastic  PDF’s  and  e.g.  just  plugging  the 
resulting  approximations  into  a  discretized  version  of  (28),  a  global  approximation  of  the  original 
continuous-time  problem  by  a  discrete-time  problem  will  be  presented.  In  particular 

•  the  approximation  T(6)  to  the  log-likelihood  function  L(0)  of  the  continuous-time  prob¬ 
lem,  will  be  interpreted  as  the  log-likelihood  function  of  the  discrete-time  problem, 

•  the  approximation  to  the  auxiliary  function  Q[0,0^)  of  the  continuous-time  prob¬ 

lem  will  be  such  that  the  fundamental  relation  (2)  will  hold  for  the  discrete-time  problem, 
i.e.  L{e)-L{e‘)>Q(6,9')  . 

Consider  indeed  the  following  discrete-time  statistical  model.  Let  first  (tn  \  Q  <  n  <  N) 
be  a  uniform  partition  of  the  interval  [0,7’}  with  time-step  At.  Suppose  that  on  a  measurable 
space  (n,.F)  are  given 

•  a  family  (p0  :  e  0)  of  probability  measures, 

•  a  discrete-time  stochastic  process  (X„  :  0  <  n  <  N)  taking  values  in  R"*, 

•  a  stochastic  process  (VJ  :  t  >  0)  taking  values  in  R**, 

such  that  under  ~Pi,  (Xn  :  0  <  n  <  N)  is  a  Markov  chain  with  transition  probabilities  kernel 

II*  =  (/-At/:s)-‘  (31) 

and  initial  density  pg,  and  this  Markov  chain  is  observed  in  continuous-time  through 

dYt  =  he(Xn)  dt  +  dWt,  <  t  <  t«+x  , 

where  {Wt  :  0  <  t  <  7)  is  a  Wiener  process  with  matrix  covariance  r,  independent  of  the 
Markov  chain  (!^„  :  0  <  n  <  N), 


Remark.  Equivalently,  one  can  consider  that  the  Markov  chain  is  observed  through  the  discrete¬ 
time  measurements 


y«  =  ^  =  /^(^«) +^n  (Ar„  ^  , 

where  :  0  <  n  <  JV)  is  a  Gaussian  white  noise  sequence  with  matrix  covariance  rAt“^, 
independent  of  the  Markov  chain  (X»  :  0  <  n  <  N). 
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First,  it  follows  from  hypotheses  {Hi  —  Hi)  that  Vi  €  R"*,  (HsCi,  •)  =  ®  G  0)  axe  mutually 
absolutely  continuous  probability  measures  on  R”*.  Define  then 

I»A  .y)  « 

as  the  corresponding  Radon-Nikodym  derivative.  Define  next 

<(i)  =  exp  {/iS(x)r-'Ay„  -  ^h5(*)r-i/is(x)At}  . 

Then  {Pg  ;  €  0)  are  mutually  absolutely  continuous  probability  measures  on  ((1,^)  with 

Radon-Nikodym  derivative 

A  A'P.  n* 

= ^ = ^i7o)  n  n  ■ 

Po  ,t=0  i=o 

Consider  also  the  probability  measure  7g  defined  by 

2'=^=n»f(^o. 

iaO 

SO  that  under  Pg,  {Yt  :  0  <  t  <  T)  is  a  Wiener  process  independent  of  the  Markov  chain 
(X„  :  0  <  n  <  N). 

Let  again  '•  0  <  t  <  T)  denote  the  observation  filtration.  It  turns  out  that  the  log- 
likelihood  function  for  the  estimation  of  the  parameter  ff  is  now  defined  by 

Z(fi)  =  logEj(7®  I  yr)  ,  (32) 


whereas  the  auxiliary  function  is  defined  by 

=  £9<(iogAs,s>  I  >V)  = 


Ey  (log  As.y  ^  I  yr ) 

I  ^r) 


5.1  Direct  maximization  of  the  likelihood  function 

The  idea  is  to  find  an  equation  for  (pfl  '■  0  <n  <  N)  defined  by 


where 


By  definition 


tsO 

=  E;(»j;(xt)[n,^](x„)2j;iy, 

n+l) 
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which  results  in  the  following  equation 

?i+i  =  n5(«^).  pS  =  pg.  (34) 

Using  expression  (31)  for  the  transition  probabilities  kernel  gives  the  foUovring  discretization 
scheme  of  Zakal  equation  (26),  which  combines  a  Trotter^like  product  formula  and  a  Euler 
implicit  scheme 

(7  -  At£;)pl+1  =  Kfn  ,  fS  =  Po  •  (35) 

It  follows  from  (32)  that  the  log-likelihood  function  L{$)  is  therefore  approximated  by 

I(«)  =  log(p5f,l)  .  (36) 

To  approximate  the  gradient  VL{0),  one  could  either 

•  directly  discretize  equation  (27), 

•  derive  the  exact  expression  for  the  gradient  of  the  approximated  log— likelihood  function 
The  second  method  is  prefered,  and  gives 

with  -  deriving  equation  (35)  with  respect  to  the  parameter  9 

(7  -  At£5)<+1  =  +  At(V£;]?t+i  +  =  Vpg . 


Remark,  (normalization)  To  avoid  numerical  overflow  one  should  rather  solve,  instead  of  (35), 
the  normalized  equations 


] 

(7-At£;)<+i  = 


0  ~  Po  > 


where  =  (W*,  4'* ).  It  is  easily  seen  that  pj  =  7*71^  with  7n  =  '  ^n-i  •  •  •  1?  and  (^,  1)  =  7* 

so  that 


N-l 

I(0)  =  log7S?= 

i=l 

In  the  same  way,  defining  ^  by  the  relation  wf,  =  7n^  gives 

(7-At£;)S|;+j  =  4{:<  +  At[V£;]<^^  +  [V4t]<  ' 

K+i  =  K+^-(it+ir' 

Note  that,  although  W*  is  the  gradient  of  p*,  is  not  the  gradient  of  Actually  = 


=  Vpg  . 


5.2  The  EM  algorithm 

Although  it  is  rather  straightforward,  in  the  discrete-time  case,  to  obtain  the  expression  of 
the  auxiliary  function  •)  in  terms  of  nonlinear  smoothing,  it  is  nevertheless  worth  presenting 
a  derivation  that  follows  the  same  lines  as  in  the  continuous-time  case.  Indeed,  there  are  two 
different  methods  -  one  based  on  nonlinear  filtering,  the  other  on  nonlinear  smoothing  ^  for  the 
computation  of  (33). 


•  Filtering 
Define 

=log4(^o)  + • 

Po  t=0  i=0 

The  idea  again  is  to  find  an  equation  for  :  0  <n  <  N)  defined  by 


First 


Po 


Next,  by  definition 

1  ytn^x) 


T  0 

=  Ei.(0(Xn+l)<(X„)[lC’*'+log/s.S.(Xn,X„+i)  +  log^(X„)l^' 


^9' 


+EJ,(<'(X„)  iog||{X„)[n,.<>](X„):^  I 

where  the  operator  is  defined  by 

(««,«' 0)(*)  =  J  ^(y)log/»,«'(®.P)n*'(x,dy)  . 

Therefore,  the  resulting  equation  is 


(37) 


<i  =  n;.(<wJ’*')  +  «;,^(<pt)  +  n;,(<iog^pi:),  vf/=pUog3 . 

*n  Po 

It  follows  from  (33)  that  the  auxiliary  function  Q{0, 0')  is  approximated  by 

'^{0,0')  =  .^^--’1^ 


(P?r.l) 


(38) 
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Remark,  (normalization)  Here  again  one  should  rather  solve,  instead  of  (39),  the  normalized 
equations 

=**  n 

=  n^Vn+i 

•  =  1  , 

<  =  J 

where  is  chosen  in  such  a  way  that  (^,^)  =  1,  which  gives  it: '  «,<<,})•  i‘  is 
then  easily  seen  that  =  l^^i,  and  that  ^  with  ■  Jn+i  '"Js-  Moreover,  the 

normalized  smoothing  density  is  given  by  ^  . 
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Remark.  In  terms  of  normalized  conditional  densities 


=  (7JS',log4)+ 

PO  isO 

N-l  N-l 

+  E -  h^At))  h0.Yr-^[he  -  he.])At 

isO  i=0 

to  be  compared  with  (28),  (29). 


Remark.  It  is  now  possible  to  give  a  more  precise  meaning  to  the  (E-step)  and  (M-step)  of 
the  algorithm.  Indeed,  being  fixed 

3.  (E-step)  compute  the  normalized  smoothing  density  (^  :  0  <n  <  N)  -  this  requires  in 
particular  to  compute  the  normalized  filtering  density  (7^  :  0  <  n  <  N), 

4.  (M-step)  maximize  ^(•,d')  -  where  for  each  €  0  the  computation  of  ^{6^9')  requires 
(i)  at  each  time  n,  to  integrate  some  functions  depending  on  {9,9')  against  the  normalized 
smoothing  density  and  (ii)  to  sum  the  resulting  discrete-time  processes  from  n  =  0  to 
n  =  JV  -  1. 


Remark.  With  the  time-discretization  introduced  above,  the  numerical  implementation  (in¬ 
cluding  discretization  with  respect  to  the  space  variable)  of  the  EM  algorithm  requires  in  the 
M-step,  the  explicit  evaluation  of  the  transition  probabilities  kernel  !!#  =  (/-  At£s)“^  On  the 
other  hand,  the  numerical  implementation  of  the  direct  maximization  algorithm  requires  only 
the  solution  of  linear  equations  with  operator  (/  —  AtCg),  a  much  faster  task. 


Remark.  There  are  some  similarity  between  the  discrete-time  version  of  the  EM  algorithm 
and  the  statistical  estimation  of  probabilistic  functions  of  Markov  processes.  This  theory  has 
been  introduced  in  [1],  and  has  found  interesting  applications  in  acoustic  speech  recognition  [5]. 
Indeed,  assume  that  observations  are  generated  according  to  a  hidden  Markov  model  (HMM):  to 
each  possible  state  x  of  the  a  non-observed  Markov  chain  defined  by  its  initial  probability  po  and 
its  transition  probabilities  kernel  II,  is  associated  a  probability  function  B{x,  ■)  which  describes 
the  conditional  law  of  the  observation  given  that  the  chain  is  in  state  x.  Such  a  model  will  be 
denoted  by  A(  =  (po,  H,  B).  Then  (under  the  additionid  assumption  that  both  the  Markov  chain 
and  the  observation  sequence  take  values  in  finite  sets),  the  maximum  likelihood  estimation  of 
the  parameters  of  the  hidden  Markov  model  M  is  achieved  by  an  iterative  procedure  involving 
reeatimation  formulas  [1,5],  which  are  obtained  from  the  explicit  maximization  of  an  auxiliary 
function  Q{M,M'). 

Consider  now  the  parametric  model  described  above.  It  is  possible  to  turn  it  into  a  parametric 
hidden  Markov  mc^el  Mf  =  (p2,ns,R*)  with 

■B»(*.y)  =  (2ir)“^(detr)~i  exp  {-^[k#(x)  -  y]*r"^[/is(x)  -  y]At|  . 

In  particular  R«(x,p„)  «  '^(x).  Then  it  is  easily  seen  that  the  auxiliary  function  defined  in 
(33)  is  such  that  ^{9,9')  —  Q{Me,Mt>) .  Moreover,  equations  (34)  and  (39)  -  which  are  known 


as  Baum’s  forward  and  backward  equations  [1,5]  -  play  a  central  role  in  the  theory  of  statistical 
estimation  of  hidden  Markov  models. 
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6  Numerical  example 


The  contmouou»-time  model  is  described  by 

dXt  =  -BiXt  dt  +  dt  +  dWt,  Xq  ~  Af{$u  e)  ,  (40) 

dYt  =  tf4arctan(^)  A +  .  (41) 

“a 

and  the  unkown  parameter  is  =  (Oi ,  $2,  ds,  64).  The  noises  covariances  in  the  problem  are  s,  a 
and  r,  ^md  can  be  associated  with  the  parameters  ffi,  (^2,^3)  and  B4  respectively. 

Although  the  unknown  parameter  is  actually  four-dimensional,  results  will  be  presented  for 
the  estimation  of  one  component  of  at  a  time,  and  the  influence  of  the  “associated”  noise 
covariance  will  be  investigated. 

For  each  of  the  cases  presented  below,  the  log-likelihood  function  has  been  maximized  in 
order  to  find  the  MLE,  either  using  the  direct  approach  or  the  EM  algorithm  based  on  nonlinear 
smoothing.  To  achieve  the  direct  maximization,  one  can  rely  on  existing  minimization  routines 
from  a  scientific  library,  e.g.  e04jbf  from  NAG  which  uses  a  quasi-Newton  algorithm  and  does 
not  require  the  user  to  provide  a  routine  for  the  computation  of  the  gradient.  On  the  other 
hand,  the  M-step  of  the  EM  algorithm  can  either 

•  be  solved  explicitely  when  applicable,  e.g.  when  the  auxiliary  function  depends  quadrati- 
cally  on  the  parameter  to  be  estimated, 

•  rely  on  routines  from  a  scientific  library. 

Two  figures  are  given  for  each  of  the  cases  considered.  On  the  first  figure,  the  following 
objects  can  be  found 

s  in  solid  line:  the  log-likelihood  function  £(•)  vs.  the  free  parameter, 

•  tn  dashed  line:  iterations  of  the  quasi— Newton  algorithm  for  the  direct  maximization  of 
the  log-likelihood  function  L{-),  i.e.  straight  lines  connecting  successive  points 

Aqi Aj,  • ' ' ,  A„,  •  * '  , 

defined  by 

A,  ^  (?„,!(?„))  . 

On  the  second  figure,  the  following  objects  can  be  found 

•  m  solid  line:  the  log-likelihood  function  X(-)  vs.  the  free  parameter, 

•  in  dotted  lines:  the  auxiliary  functions  corresponding  to  successive  estimates,  i.e.  functions 
I«(-)=^(-,  ?»)+!(?«),  vs.  the  free  parameter. 


•  in  dashed  lines:  iterations  of  the  EM  algorithm,  i.e.  straight  lines  connecting  successive 
points  i4o, Bn. •  •  •  defined  by 


An  =  0nMK))  . 

Bn  =  . 


Remark.  In  the  example  introduced  above,  although  the  auxiliary  function  Q{9,d')  of  the 
continuous-time  model  depends  quadratically  on  the  parameters  $i,  $2  and  ^3,  the  discrete-time 
approximation  Q(0,  depends  quadratically  on  61  only.  This  can  be  seen  on  the  expression  of 
the  operator  ks,s<  -  see  (37). 


Description  of  cases  study 

In  all  these  cases,  the  ‘^nie”  value  of  the  parameter  -  i.e.  the  value  used  for  simulating 
sample  paths  of  the  observation  process  -  is  (^i,^2»^3>^4)  ~  (1.0,0.25,5.0,2.0). 

The  time  interval  is  [0,7^  with  T  =  10.0  and  time-step  At  =  0.1.  Observation  process 
sample  paths  are  simulated  in  the  following  way.  First,  simple  Euler  time-discretization  scheme 
(equivalent  on  this  particular  example  to  Milshtein  scheme)  is  used  to  simulate  the  signal  process 
(«) 

Xn+l  =  *n  +  \-02Xn  +  ^37— +  U;„  , 

with  xo  ~  Ar(fii ,  e)  and  (Wn  :  0  <  n  <  iV)  a  Gaussian  white  noise  sequence  with  covariance 
matrix  aAt.  Next,  discrete  measurements  are  generated  by 

y„  =  ^4  axctan(-A)  , 

with  (tUn  ;  0  <  n  <  iV)  a  Gaussian  white  noise  sequence  with  matrix  covariance 
independent  of  (u)„  :  0  <  n  <  N). 

These  discrete  measurements  are  used  to  solve  equations  (34)  and  (39),  and  therefore  to 
compute  the  approximations  L{0)  and  Q(0,$')  defined  by  (32)  and  (33)  respectively. 


•  Estimation  of  $i 

Fixed  parameters:  {02,03,04)  =  (^2.^3. ^4)- 

Noises  variances:  a  =  1.0,  r  s=  l.Q,  and  E  —  1.0  (Case  I  -  fig.  1  and  2)  or  E  =  0.01  (Case  II  - 
fig.  3  and  4). 

In  Case  I  the  EM  algorithm  has  converged  after  11  iterations,  whereas  in  Case  n  it  has  not 
converged  after  200  iterations.  Therefore,  only  the  12  first  iterations  are  shown  on  fig.  4. 


•  Estimation  of  03 

Fixed  parameters:  {0i,02,04)  = 

Noises  variances:  E  =  1.0,  r  —  1.0,  and  a  =  1.0  (Case  III  -  fig.  5  and  6)  or  o  =  0.01  (Case  IV  - 
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fig.  7  and  8). 

In  Case  III  the  EM  algorithm  has  converged  after  5  iterations,  whereas  in  Case  IV  it  has  not 
converged  after  200  iterations.  Therefore,  only  the  12  first  iterations  are  shown  on  fig.  8. 

•  Estimation  of  $4 

Fixed  parameters:  (0i,®2)®3)  = 

Noises  variances:  £  =  1.0,  a  =  1.0,  and  r  =  1.0  (Case  V  -  fig.  9  and  10)  or  r  =  0.01  (Case  VI  - 
fig.  11  and  12). 

In  Case  V  the  EM  algorithm  has  converged  after  9  iterations,  whereas  in  Case  VI  it  has  converged 
after  27  iterations. 

The  reason  why  the  EM  algorithm  is  so  slowly  convergent  when  noise  covariances  are  small 
-  Case  n,  IV  and  VI  -  is  that  the  log-likelihood  function  is  then  approximated  &om  below 
by  a  set  of  very  sharp  auxiliary  functions:  this  situation  does  not  allow  to  update  significantly 
enough  the  current  estimate  at  each  M-step.  Actually,  this  can  be  seen  directly  firom  (6),  (7)  - 
or  equivalently  from  (28),  (29).  Assume  for  instance  that  both  Po(')  ^#(')  independent 

of  0,  and  that  the  observation  noise  covariance  r  is  small.  Then  every  auxiliary  function  Q{-,0') 
will  certainly  be  very  sharp.  It  should  be  stressed  that  in  such  cases,  the  slow  variation  of  the 
estimate  should  not  be  interpreted  as  an  indication  that  the  algorithm  has  already  achieved 
convergence,  as  one  would  possibly  conclude. 
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7  Conclusion 


The  direct  maximization  of  the  log-likelihood  function  has  been  compared  with  the  EM 
algorithm,  for  the  MLE  of  parameters  in  partially  observed  diffusion  processes.  Some  formulas 
given  in  [2]  have  been  clarified,  and  it  has  been  shown  that  smoothing  is  necesjiry  to  make 
the  EM  algorithm  approach  efficient.  On  the  other  hand,  formulas  have  been  given  in  terms 
of  filtering  stochastic  PDE’s  for  the  computation  of  the  original  log-likelihood  function  and  its 
gradient. 

It  has  been  shown  that 

[E]  the  E-step  in  the  EM  algorithm  is  certainly  slower  than  the  direct  computation 
of  the  log-likelihood  function,  since  it  involves  nonlinear  smoothing  instead  of 
nonlinear  filtering. 

[M]  the  computation  of  the  auxiliary  function  &*)  in  the  M-step  of  the  EM  al¬ 
gorithm,  O'  being  fixed,  requires  (i)  at  each  time  t,  to  integrate  some  functions 
depending  on  (0,0')  against  a  normalized  smoothing  density  depending  only  on 
ff,  and  (ii)  to  integrate  the  resulting  processes  over  the  interval  [0,T].  This 
gives  another  evidence  that  the  EM  algorithm  is  more  complicated  than  the  di¬ 
rect  approach  as  far  as  computations  are  concerned.  On  the  other  hand,  the 
maximization  of  the  auxiliary  function  is  generally  simple  to  deal  with. 

[EM]  the  EM  algorithm  converges  very  slowly  whenever  some  noise  covariances  associ¬ 
ated  with  the  parameters  to  be  estimated  are  small. 

However,  the  EM  algorithm  should  provide  an  interesting  approach  for  non-parametric  es¬ 
timation  in  the  context  of  partially  observed  diffusion  processes,  i.e.  non-parametric  estimation 
of  the  initial  density,  the  drift  and  the  observation  function.  This  form  of  the  EM  algorithm 
is  used  indeed  in  the  context  of  finite-space  Markov  chains  with  finite-state  observations  (hid¬ 
den  Markov  models),  and  leads  to  well-known  reestimation  formulas,  which  are  of  practiced  use 
e.g.  in  acoustic  speech  recognition. 


I 
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Rysumy  :  Nous  sontrons,  sous  des  hypothyses  asset  gynerales ,  que 
la  density  conditionnelle  non  normalisye  en  filtrage  non 
lineaire  est  1 ' unique  solution  -  dans  un  espace 
convenable  de  processus  -  de  I’yquation  de  Zakai.  La 
principale  restriction  est  que  tous  les  coefficients 
doivent  etre  bornes. 


Abstract  :  We  prove,  under  rather  general  conditions,  that  the 
conditional  density  in  nonlinear  filtering  is  the 
unique  solution  -  within  an  appropriate  space  of 
processes  -  of  Zakai 's  equation.  The  main  restriction 
is  that  all  coefficients  are  supposed  to  be  bounded. 


*  partiellement  soutenu  par  USA  CCE  dans  le  cadre  du  contrat  DAJA 
45-87-M-0296. 
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Introduction 

II  est  bien  connu  que  dans  un  probl^ne  de  filtrage  non 
lin^aire  de  processus  de  diffusion,  une  certaine  version  de  la  loi 
conditionnelle  non  norznalis4e  satisfait  une  Equation  aux  d^riv4es 
partielles  stochastique  lin^aire  appel^e  Equation  de  Zakai  -  cf. 
ci-dessous.  II  est  d^s  lors  int^ressant  de  caract^riser  cette  loi 
conditionnelle  non  nornialis4e  coniine  ^tant  1 '  unique  solution  -  en 
un  certain  sens  -  de  cette  Equation  de  Zakai.  Un  tel  r^sultat  peut 
se  decomposer  en  deux  parties  :  d'une  part  un  theorems  d'unicite 
pour  1' equation  de  Zakai  dans  une  certaine  classe  de  processub, 
et  d' autre  part  un  resultat  de  regularite  permettant  d'af firmer 
que  la  loi  conditionnelle  non  normalises  appartient  4  cette  meme 
classe  1^. 

De  nombreux  resultats  de  ce  type  ont  ete  etablis  par  divers 
auteurs,  dans  des  cadres  plus  ou  moins  generaux.  Dans  le  cas  de 
coefficients  bornes,  Kunita  [11]  et  Szpirglas  [17]  ont  etabli  un 
resultat  d'unicite  dans  le  cas  oCi  le  signal  et  le  bruit 
d ' observation  sont  independants,  Krylov-Rosovskii  [10]  et  Pardoux 
[13]  sous  des  hypotheses  d'uniforme  ellipticite.  Divers  types  de 
coefficients  non  bornes  ont  ete  consideres  dans  le  cas  oil  le 
signal  et  le  bruit  d ' observation  sont  independants  dans  Pardoux 
[15],  Baras-Blankenship-Hopkins  [1],  Fleming-Mitter  [6], 
Kallianpur-Karandikar  [10],  Ferreyra  [5]  et  Kurtz-Ocone  [13]  (Dans 
ce  dernier  article,  sont  egalement  traites  des  cas  oCi  i  1  y  a 
correlation  des  bruits).  Bensoussan  [2]  considers  des  coefficients 
non  bornes  dans  des  cas  oil  le  signal  et  1 ' observation  sont 
correies  avec  une  condition  d'ellipticite.  Haussmann  [9]  considers 
des  coefficients  dependant  de  1 ' observation.  Enfin  Canarsa-Vespri 
[3]  considerent  le  problems  d'unicite  pour  1' equation  de  Zakai 
avec  des  coefficients  non  bornes  et  une  correlation  entre  signal 
et  bruit  lorsque  1 ' observation  est  en  dimension  un.  Le  fait  que 
cette  unique  solution  est  la  loi  non  normalises  est  etabli  par 
Florchinger  [7]. 

Notre  but  est  d'etablir  un  resultat  tres  general,  en 
supposant  cependant  tous  les  coefficients  bornes  et  de  classe  C*”. . 
Par  ailleurs,  la  loi  itiitiale  est  quelconque,  et  nous  ne  faisons 
absolument  aucune  hypothese  de  non-degenerescence  :  la  loi 
conditionnelle  ne  possede  pas  necessairement  de  densite.  En*  outre, 
le  signal  et  le  bruit  sont  correies,  et  tous  les  coefficients 
dependent  de  tout  le  passe  de  1' observation.  Ce  dernier  point  est 
tres  important  pour  les  applications  au  contrdle  stochastique  avec 
observation  partielle. 

Nous  allons  tout  d’abord  etablir  un  resultat  d'unicite  d'une 
equation  aux  derivees  partielles  stochastique  dans  des  espaces  de 
Sobolev  d' indice  quelconque,  4  I'aide  de  proprietes  eiementaires 
des  operateurs  pseudo-dif f erentiels .  Ensuite  nous  montrerons  que 
la  loi  conditionnelle  non  normalisee  appartient  4  un  certain 


espace  de  Sobolev  d' indice  n4gatif. 


1.  Position  du  probl^me 

Nous  allons  6tudier  I'unicit^  des  solutions  d'une  Equation 
aux  d^rivSes  partielles  du  type  suivant  (on  utilise  ici  et  dans 
toute  la  suite  la  convention  de  sonunation  sur  indice  r^p^t^}  : 

(*)  du  =  Au  dt  +  B  u  dw|  ,  u  donn6 

*  *  lit  0 

oCt  : 


(i)  }  est  un  processus  de  Wiener  sur  un  espace  de  probabilit6 

(fl.y.JjtP)  i  valeurs  dans  P. 

(i)  A  (resp  sont  des  op^rateurs  differentials  sur  FT 

d'ordre  2  (resp.l)  s'l&crivant  : 


A 


P 


+  c 


B 


2 

1 


oix  X,,  X  . .  .X  ,Y.  . .  .Y  sont  des  champs  de  vecteurs  sur 

IR"  , c,h^  . .  .h^  des  fonctions  sur  IR*  ;  dependant  de  (cii>,t)en  x  . 

Si  a  designe  I'une  de  ces  fonctions  ou  I'un  des  coefficients  des 
champs  de  vecteurs,  on  suppose  que  : 

-a;^lxR^xR“-^R  est  JP®B^  roesurable,  oO  y  designs  la  tribu  des 
evenements  progressivement  mesurables  de  fix  R^  ,  et  B^  la  tribu 

boreiienne  de  R” . 

-  pour  tout  (w,t)efi  x  R^ ,  a(w,t,.)  est  dans  C^CR*  ) ,  les  bornes 
etant  uniformes  en  (w,t). 

1.1.  Classes  de  processus  A  valeurs  dans  des  espaces  de  Sobolev 
Dans  ce  paragraphs,  nous  rappelons  les  principales 

definitions  concernant  les  espaces  de  Sobolev  et  les  operateurs 
pseudo-dif f erentiels .  Nous  renvoyons  au  livre  de  Treves  [18]*pour 
un  expose  detailie  sur  ce  sujet. 

1.1.1.  Espaces  de  Sobolev 

Comme  dans  [4],  on  introduit,  pour  a  dans  R,  le  potential  de 
Bessel  qui  agit  sur  S'(R^)  (espace  des  distributions  temperees) 
par  la  formula  suivante  : 


A^f(t)  »  (1+ltl*) 


«/  2 


f  (5.)  . 


4 


Au(x)  =  (20)- 


P(U  u(U  dl 


t 


c‘est  done  un  op^rateur  pseudo-dif f ^rentiel  d'ordre  n,  de  symbole 
P. 


.  Si  A  est  un  op^rateur  differential  4  coefficients  non 
constants,  on  peut  montrer  qu'il  existe  un  op^rateur 
pseudo-dif ferentiel  B  tel  que  A-B  soit  un  op6rateur  r6gularisant 
i.e.  d'ordre  -ot>.  On  peut  done  identifier  A  4  un  414inent  de 

4>^  (U)  =  (U)/«I>_5(,(U)  . 

Proposition  :  Un  op6rateur  pseudo-dif f 4rentiel  d'ordre  m  sur  Uc|R* 
d6finit  une  application  continue  de  H®  (U)  dans  H®""(U)  oil 

C  1  0  c 

H*  (U)  =  <ueH®  (P  )  ,  supp  uCU> 

H®  (U)  *  H®  (U)n£*  (U) 

C 

“i.c  =  |u€8MP),  V«p€C®(U)  ,  <|>ueH®  (U)| 

Definition  :  Soit  AeO**  (U)  et  Beji*"’  (t|>)  .  Alors  [A,B]  =  AB-BA 
est  dans  4»**  ■  ’  -  ^  . 

4>(U)  =  U  «)»■  (U)  muni  de  ce  crochet  est  une  algebre  de  Lie. 
meW 

Dans  la  suite  de  1' article,  nous  travaillerons  sur  la  sous 
algebre  de  Lie  engendree  par  les  potentiels  de  Bessel  et 
les  operateurs  differentials. 


1.1.3.  Classes  de  processus 

On  definit  la  classe  de  processus  (cf.  [4]  et  [14])  ; 

JC;'  =  L*  (U  X  [0,T],H“(P)) 

et,  sur  cet  espace,  la  norme 

I  I  |u|  I  I,  ^  -  |E(j^  ||u(w,t)  II*  dt) 

On  introduit  aussi  ;  =  Xj**  *  n  L*  (U  ;  C([0,T]  ;  H“{p)*)) 

1.2.  Theorene 

Soit  aeR.  L' equation  (*)  admet  au  plus  une  solution  dans 

n  dont  la  valeur  en  t^O  soit  un  element  donne  de  h“(P). 
T>0 


2.  Demonstration  du  theoreme. 

Bile  se  decompose  en  cinq  etapes. 


2.1  Formule  de  It5. 


2.1.1.  Proposition  :  Soit  u  une  solution  de  (*)  dans 


Alors  est  dans  ^  et  v6rifie  : 


Ho  =  Hv,  II*  +  ,A^A  u. 


U  >  +2  ||B.  U 


IIB,  u.  ||*„  ^s 


+  21  (v  ,A  B  u  )dw‘  ,Vo<t<T. 

Preuve  :  Si  u  est  dans  JCj"*  *  .  d’apr^s  les  hypotheses  faites  sur 

les  op6rateurs  Au  est  dans  )£j‘‘ ‘  et  B^  u  dans  JCj*.  Done 

A*u  est  dans  ,  A*  Au  dans  et  B^  u  dans  Kj  .  On  a,  de  plus, 

I’egalite  : 

A*  u  =  A*  u  +1  A*Au^  ds  +  I  A“B  u  dw‘  ,  Vo^t^T. 

En  effet  ceci  revient  k  montrer  que  I’on  peut  conutnuter  A*  avec 
1 ' integration  de  Lebesgue  ou  stochastique ,  ce  qui  est  obtenu  en 
approchant  Au  (resp.  B^u)  par  des  fonctions  simples  dans 

JCr-*  (resp.  JC;-). 

La  proposition  resulte  alors  de  la  formule  d’lto  de,[14]  dont  les 
hypotheses  sont  trivialement  verifiees. 

2.1.2.  Corollaire  ; 

Soit  u  une  solution  de  (*)  dans  T)|^  alors  : 


,Au  >  +2 


EdIuJI*)  =  EdIuJI*)  +  E-^2<u,  ,Au, 


„  +  ^  llBjU^II^Ws,  0<t^. 


Dans  les  paragraphes  suivants .  on  estime  les  termes  sous 
1*  integrals  en  fonction  de  llu,  11^  afin  de  pouvoir  appliquer  le 
lemme  de  Gronwall. 

2.2.  In6qalite  k  priori  dans  (P ) . 

2.2.1.  Proposition  ;  II  existe  une  constants  K>0  telle  que  : 


VfeHMP),  <Af,f>^^  K||f||*  +  <f.B*f>^ 

2  i  —1 


I 


Preuve  ;  Nous  conuneni;ons  par  6noncer  un  lemme  que  nous  utiliserons 
plusieurs  fois  par  la  suite. 

2.2.2.  lemme  :  Soit  X  un  champ  de  vecteurs  de  classe  Cj  sur  R*  . 
Alors  : 

(i)  X*  =  -X  +  b  oil  beC  (R*) 

B 

(ii)  II  existe  une  constante  K  telle  que  : 

Vf€H^  (P  ) ,  I  (Xf  ,f)  1  <  K||f  II* 
n  n 

Preuve  ;  Si  X  =  2  X  ,  X*  =  -  2  (X  . ) 

i=l  i=l  ^*1 


n  n 

V  d  V 

-  -  2mt  X  -  -  2.t  -  «  -X  -  div  X 

i=l  i=l  ^*1 


Soit 

fcH‘  (P  ) 

;  {Xf,f)  =  (f,X*f)  =  -(f,Xf)  -  (f,divX.f) 

1 

D'oCi 

(Xf,f)  = 

- (£,divX.f ) 

2 

1 

Alors , 

|(Xf,f)  1  <  -  IldivXlIJIfli;  . 

Suite  de  la  preuve  de  la  proposition  2.2.1. 

Par  definition  de  A,  on  a  : 

m 

<Af,f>  =  —  2  <X*f,f>  +  <X  f,f>  +  <cf,f> 

0  o  1  0  0  0  0 

2  i«l 


P 

i»l 


<B*f,f> 
2  ‘  • 


II  resulte  alors  du  lemme  2.2.2.  que  : 


m 


2  i»l 


^  i«l 


i 
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m 

II  reste  k  6tudier  le  terme  ^  <X*£,f>^.  Soit  ie-Cl,...,n}  : 

i*! 

<Xjf,£>,  =  (X,  f.X^f)  -  -|(Xj£||*-(Xj£,div  Xj.f) 

D’oCi  llfll* 

ce  qui  achSve  la  preuve. 

2.3.  Infeoalitfe  a  priori  dans  H“(iy). 

2.3.1.  Proposition  :  II  existe  une  constante  K>0  telle  que  : 


P 

V£€H“*»  (R*),  <Af,f>„  <K||f||*  +  -  2 

2  isi 

Preuve  :  Soit  £eH“*Mff?). 

<Af.£>,  =  <A^Af.A„£>^  »  <K  A^£,A^f>^ 

+  <[A^.A]f,A„f>^. 

Le  premier  terme  est  estim^  grdce  k  la  proposition  2.1.1.. 

Afin  de  majorer  le  second,  on  utilise  une  technique  de  commutation 
et  passage  de  1* adjoint. 

On  introduit 

g=A^£  et  T=[A^,AjA_^. 

1 

<[A^,A]£,A^f>^  =  (Tg,g)  =  -(  (T+T* )g,g) . 

2 

Calculons  T*  ; 

T*  =  ^,CA*.A^]  -  [A^,-A*]  A_„  +  [A.„.[A*.A^]]. 

Or  A*-  A  est  un  op6rateur  d’ordre  1.  En  effet  ; 

m 

A*-  A  »  i  2  (X** 

2  i»l  ‘ 

m 

»  -  2  /(-X  -divX,  )*  -  X*\  -  2X,  -  divX 

2  4-,  I  ‘  ‘  ‘i  •  ® 


-  Xj)*  + 


X*  - 
0 


d'o(i 


m  n 

»  S  divX.  .X.  -  2X„  +  -  2  (divX  )*  -  divX 
i=l  ^  i=l 


T+T*  =  [A„,A-A*]  +  [A_,,CA*  ,A^]] 

est  un  op6rateur  d'ordre  0,;  ce  aui  pernet  de  conclure. 

2.4.  Infeaalitfe  d'feneraie  dans 

2.4.1.  Proposition  :  II  existe  une  constante  K>0  telle  que  ; 

P  P 

2  -  2  +  Kilfjl* 

i*l  i=l 


Preuve  :  Conmenqons  par  le  cas  o^O.  On  a,  pour  xe'(l...p}-  : 

<Bj£.B^£>,  =  <£.B*B^£>^  *  -<£,B»£>^  -  (£  ,divB^  .B^  £) 

Le  dernier  terme  est  estiin4  qrSce  au  lenme  2.2.2.. 

Soit  a  un  r6el  quelconque 

l)B^£))*  = 

+  2  <BjA^£,[A^,B,  ]£>^  +  ll[A„,Bjf||»  . 

L’op6rateur  [A^.B  ]  4tant  d’ordre  a,  le  dernier  terme  est  estim6 
ais^ment.  D* autre  part,  d'aprds  I'fetude  du  cas  <x=0,  le  premier 
terme  est  majors  par  : 

-<A^£,B*A^£>^  +  K||£||‘ 

II  reste  done  A  estimer  :  ®i  ' ®i 

avec  TaB*  [A^,Bj  ]A_„ 

On  raisonne  comma  dans  la  preuve  de  ^a  proposition  2.3.1.  ie  on 
Acrit  ; 

et  on  va  montrer  que  T+T*  est  un  opjferateur  d’ordre  0. 


T*  -a_jb;,a,]b,  -  [A1.,[B;.AJ]B^ 


D'od  : 


T*  =  b;  [A„,B;]A_„  +  divBj  [A„,B;  ]A_^ 


+  op4rateur  d'ordre  0 

et  T+T*  =  B|  [A^,  -  divBj]A_^  +  op6rateuf  d'ordre  0  est  un 
op4rateur  d'ordre  0. 

2.5.  Fin  de  la  demonstration 

II  r^sulte  du  corollaire  2.1.2.,  des  propositions  2.3.1,.  et 
2.4.  I'in^galit^  suivante  :  si  u  est  une  solution  de  (*)  dans  ^  , 
alors  : 


EdIuJI*)  <  E(||uJ|*)+K  E(||uJ|*)ds. 

On  d6duit  alors  du  lenune  de  Gronwall  que  : 

E(||u,  II*)  «  e‘*  EdIuJI*) 

d’oil  lllulH*  ,  K'  BdluJI*)  .  VTeR^. 

On  en  d4duit  l'unicit4  de  la  solution  de  (*)  dans  ^  pour  tout 

TeR^  et  par  suite  dans  fl 

T>0 


3.  Application  au  filtraae 
3.1.  Description  du  module 

Consid6rons  le  couple  (signal,  observation)  notfe  (x^.y^) 
solution  du  systSme  suivant  : 


(P) 


dXj  *  (t,y,x^  )dt+Xj  (t,y,Xj  )o  dw|  +  X^  (t,y,x^  )  (dw| +hMt,x^  )dt) 

dy^  “  h{t,y,x^)dt  +  dw^ . 

(x^,y.  )  suit  une  loi  (I  9$  et  est  independent  de  (v,w)  . 

0  0  0  0 


On  designe  par  a(t,y,x)  I'une  quelconque  des  conposantes  des 
champs  de  vecteurs  ou  des  fonctions  intervenant  dans  ce  systems  et 
on  fait  les  hypotheses  suivantes  : 

(i)  a  ;  R^  X  C{R^,Rr)xRr  - »  R 
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est  tPy®  mesurable  oCl  d^signe  la  tribu  des  6v6nements 

progressivement  mesurables  de  R^xC(R^  ;  P). 

(ii)  Pour  tout  couple  {t,y)€R^x  C{R^,P),  1 ' application 
xeR*  - ►  a(t,y,x)  est  de  classe  c“(P  )  ,  les  bornes  de  a  et  de  ses 

D 

dSriv^es  6tant  indSpendantes  de  (t,y) . 

Notons  que  sous  les  conditions  (i)  et  (ii) ,  le  syst^me 
dif£6rentiel  stochastique  (F)  poss^de  une  unique  solution  faible 
(autrement  dit,  le  probldme  de  martingales  associ^  est  bien  pos6) . 
En  effet,  si  (w^ ,y^  ;  t>0)  est  le  processus  canonique  de 

A=C(R^;RT)  X  C(R^;P’)  muni  de  la  tribu  bor61ienne  et  de  la  mesure 
de  Wiener,  1* Equation  di££6rentielle  stochastique  : 


dx^  =  Xjj  (t,y,Xj  )dt  +  (t,y,x^  )  o  dw|  +  X^  (t,y,Xj  )dy^ 

avec  X  donn6  dans  R*  possdde  une  unique  solution  £orte.  Une 
0 

application  standard  du  th^ordme  de  Girsanov  permet  de  conclure. 


Soit  la  tribu  engendr6e  par  y^,s<t. 

3.2.  L* Equation  de  Zakai 

Rappelons  que  nous  associons  au  syst&me  de  £iltrage  prSc^dent 

une  mesure  de  probability  P^  dite  de  ryt^rence  dy£inie  par 

dP  ft  ,  1  ft 

-  ly,  =  L  *  exp(  h  (s,y,x  )dy‘-  -  |h(s,y,x  )  |*ds)  . 

dP  ‘  *  Jo  ‘  •  •  2  Jo  * 

0 

On  peut  alors  exprimer  le  £iltre  £=E{£ (x^ ) lY^ )  associy  4  une 
£onction  £  mesurable  bornye  A  I'aide  du  £iltre  non  normalisy 

i £ormule  de  Kallianpur-Striebel  : 

n  f  =  —  . 


est  solution  au  sens  des  distributions  de  I'yquation  de  Zakai 


(Z) 


dPt  *  Pi^*^  KPt‘^yt 


p  «n 

^0  0 


avec  : 


m 


-{2  Xj  +  2  Xj)  + 
^  i-1  i»l 


*0 


-2 

O  ill 

2  i«i 


L,  -  X  + 


n  ^ 

Sj  X,  *!  -  2  —  x| 

j=l 

3.2.1.  Th6or6me 

Soit  aeR.  (Z)  admet  au  plus  une  solution  dans  n  K 

T>0 

la  valeur  en  t=0  soit  donn^e  dans  H*^. 


dont 


Preuve  :  Les  hypotheses  du  Th.  1.2.  sont  clairement  satisfaites. 


3.2.2.  Thfeoreme  :  Quelle  que  soit  la  mesure  de  probabilit6  sur 
P,  et  quel  que  soit  £>0,  le  filtre  non  normalise  associe  est 


1 'unique  solution  de  (Z)  dans 

T>0 


n 

- - e 

r  2 


Preuve  :  On  va  montrer  que  est  dans  H  ^  et  p  dans 

n  n 

- e  - £-1 

JC,  ^  •  L'appartenance  de  p  A  L*  (n,*C  ( [0  ,T]  ;  H  ^  )) 

decoule  alors,  d'apres  le  lenune  1.4.  de  C14J,  du  fait  que  p 
satisfait  (Z)  (cf.  Pardoux  [16]  oCl  1 'equation  (Z)  est  etablie  sous 
des  hypotheses  diffdrentes  des  ndtres,  nais  la  meme  demarche  est 
applicable  ici) .  Nous  allons  commencer  par  montrer  que,  pour  tout 

n 

- - £ 

t€[0,T],  p^  est  dans  H  p.s.. 


n 

- fi  ^ 

HpJI*  „  -  (i+KI*)  2  Ip, 

- e  ,  P 

2 


n 

- - £ 

I*  \  2  , 


sup  )p,  (t)  I*  (1+ie.i* )  de. 

tep 


p^  (e.)  -  (e  »  |Y,  )  .  D'oii  |p^  (e.)  |  <  E,  (L,  |Y^  ) 


En  integrant  par  rapport  e  dP  dt,  on  obtient  : 
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I  1  IPI  I  I*  n  <  C  T  (sup  Lp  <  +00 

- - £,T 

2 

Note  :  Nous  venons  de  prendre  connaissance  d'un  article  de  Fujita 

[8]  ou  un  r^sultat  d'unicit^  est  obtenu  par  des  techniques 
semblables  aux  n&tres,  mais  sous  des  hypotheses  plus  restrictives . 
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Piecewise  linear  filtering  with  small  observation  noise 
W.H.  Flemingf,  D.  Jif,  E.  Pardoux} 


Abstract:  We  consider  a  piecewise  linear  filtering  problem  with  small  observation 
noise.  It  is  shown  that  one  can  construct  an  approximate  finite  dimensional  filter  which 
uses  a  bunch  of  Kalman  filters,  together  with  a  test  procedure  to  decide  which  Kalman 
filter  to  follow. 


1.  Introduction 

The  aim  of  this  paper  is  to  propose  an  approximate  optimal  filter  for  the  filtering  problem: 

dxt  —f(xt)  dl  +  dxvt 
dyt  =h{xt)  dt  +  edvt 


where  {zi}  is  a  scalar  unobserved  process,  {yi}  is  a  scalar  observed  process,  e  is  a  “small” 
parameter,  {tu«}  and  {vi}  are  mutually  independent  standard  Wiener  processes.  We  as¬ 
sume  that  R  —  where  /i,...,/|  are  disjoint  intervals,  /  and  h  are  continuous 

mappings  from  R  into  R,  whose  restrictions  to  each  Ki  are  afBne. 

Roughly  speaking,  our  result  is  a  follows.  Provided  a  certain  “detectability  hypothesis” 
is  satisfied,  an  approximate  optimal  filter  is  given  by  one  of  a  set  of  /  Kalman  fillers,  the 
decision  about  which  Kalman  filter  to  follow  for  a  given  period  of  time  being  taken  in  view 
of  the  outputs  of  the  /  Kalman  filters. 

Let  us  sketch  the  general  ideas  on  a  simple  example.  Suppose  that  i  =  2,  /]  =  iR_ 
and  /j  =  R^,  and  that: 


={«::: 


if  X  >  0 
if  X  <  0 

ifx>0 
if  X  <  0. 


We  now  consider  the  two  linear  filtering  problems: 


dxt  =  F^Xt  dl  H-  dwt 
dyt  =  H^xt  dt  -I-  edvf 


r  dxt  =  F-Xt  dl  -I-  dwt 

\  dyt  s=  H-Xt  dt  +  e^Vt 


t  Brown  University,  Providence,  RI,  USA,  partially  supported  by  NSF  under  grant 
MCS-8121940  and  by  AFOSR  under  contracU  F-49620-86C-0111  and  AFOSRr86-0315. 

t  University  de  Provence,  F13331  Marseille  and  INRIA,  partially  supported  by  USACCE 
under  contract  DAJA45-87-M-0296 
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to  which  one  associates  two  Kalman  filters  (KF^)  and  (KF-),  with  outputs  {xf,R}')  and 

(X|  ,  Rf  ). 

If  >  0,  then  h  is  one  to  one,  and  since  e  is  small,  we  can  almost  deduce  from 

{2/4 1  ^  ^  current  value  of  h(xt),  hence  of  X|.  More  precisely,  from  the  results  of 

Picard  [8], [9], [10]  (  see  also  Katzur-Bobrovsky-Schuss  [6],  Bensoussan  [1],  Ji  [5]),  we  know 
that  the  conditional  law  of  X|,  given  3^i  =  a{yt\ 0  <  s  <  t}  has  a  small  variance. 

If  for  instance  x^  >  0  and  is  significantly  different  from  zero  for  any  s  6  [t  —  or,t]  (in 
which  case  the  same  is  true  for  x^),  the  conditional  law  of  X|  given  3^|  is  almost  completely 
concentrated  on  (at  least  with  probability  almost  one)  and  consequently  the  output 
of  (KP+)  is  very  close  to  the  conditional  law  of  x<,  given  3^i  (  as  we  will  see  below,  the 
way  in  which  (K F^)  is  initialized  docs  not  play  a  sinificant  role),  at  least  with  probability 
almost  one. 

Suppose  now  that  H+H-  <  U.  Tf  m  we  need  some  “detectability  hypothesis”.  Indeed, 
if  /(x)  =  0  and  /»(x)  =  Jx],  then  <  .early  the  conditional  law  of  X|  given  Ft  is  symmetric 
with  respect  to  0,  and  cannot  be  reasonably  approximated  by  the  output  of  a  Kalman 
filter.  Suppose  moreover  that  \H+\  9^  Then,  for  e  =  0,  the  quadratic  variation  of 

dyt/dt  =  h(xt)  tells  us  whether  X|  <  0  or  X|  >  0.  One  may  then  expect  that  for  f  >  0  but 
small,  the  conditional  law  of  X|  given  3^t  has  again  a  small  variance,  and  that  a  decision- 
about  which  of  (KF^)  or  (A'F_)  to  follow  might  be  reached  by  comparing  the  outputs  of 
these  two  filters.  The  proof  of  these  facts  is  the  crucial  step  in  our  argumentation. 

Our  results  are  illustrated  by  the  numerical  results  in  Fleming  et  all  [3].  Let  us  insist 
upon  the  fact  that  the  hypothesis  of  a  high  signal-to-noise  ratio  is  crucial  for  the  validity 
of  the  algorithm  which  we  propose.  Without  that  hypothesis,  the  conditional  law  would 
spread  out  over  the  whole  real  line,  and  probably  none  of  the  Kalman  filters  would  give  an 
acceptable  approximation  of  the  conditional  law.  A  totally  different  algorithm  is  proposed 
for  that  situation  in  Pardoux-Savona  [7]. 

Generalisations  to  higher  dimensional  situations,  ais  well  as  to  the  case  where  /  and 
h  are  nonlinear  and  h  piecewise  one  to  one,  wil  be  considered  elsewhere. 

The  paper  is  organised  as  follows.  In  section  2,  we  formulate  precisely  the  problem 
and  the  assumptions,  as  well  as  some  technical  results  which  will  be  needed  in  the  sequel. 
Section  3,4  and  5  study  in  detail  the  case  where  /  =  2,/i  =  iR_,  =  iR+,  and  h  is  not 
globally  one-to-one  and  satisfies  a  “detectability  hypothesis”.  In  section  6,  we  summar 
rize  an  approximate  filtering  procedure  for  the  case  studied  in  the  previous  sections,  and 
indicate  the  procedure  in  the  general  case. 
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2.  Formulation  of  the  problem  and  preliminary  lemmas. 

Let  n  5=  C(W+)  X  C(JR+),  T  its  Borel  field,  and  Xt{u)  =  =  ‘*'2(0-  ^  b* 

a  probability  measure  on  (Q,  J^)  which  is  such  that: 


(2.1) 


ds  +  Wf 


(2.2)  y,  =  1  /  h(xt)ds  +  Vt, 

£  Jo 

where  {tsi}  and  {v*}  are  two  mutually  independent  standard  Wiener  processes,  Zq  is  a 
random  variable  independent  of  {wt,v*;i  >  0}  with  F*[e®p(c®o)]  <  oo  for  some  c  >  0;  / 
and  h  are  continuous  mappings  from  ]R  into  jR,  which  have  the  following  special  from.  We 
assume  that  JR  =  where  /i,...,7{  arc  closed  intervals  with  disjoint  interiors,  and 

the  restrictions  of  /  and  h  to  each  /{  are  afTine  functions,  i.e. 

/(x)  =  /;*  +  /<;  i€/<,l<*</ 

U{x)  =  //,•  I  +  /»,•;  X  6  7i,  1  <  *  <  f 

where 

. Hi,  hi,...,  hi  e  R- 

It  is  well  known  that  P*  exists  and  is  unique,  see  e.g.  Slroock-Varadhan  [11].  {zj  is  an 
unobserved  process,  while  {j/j}  is  observed.  We  define  <  «  <  l)  and  seek  to 

compute  at  each  time  t  the  conditional  law  of  X|  given  3^|.  Our  aim  is  in  fact  to  obtain  an 
asymptotic  result,  as  £  — »  0,  concerning  a  finite  dimentional  filter  to  be  described  later. 
We  will  assume  throughout  the  paper  that  : 

(PI)  Hi  56  0;  .  1  <  t  <  / 

Let  us  now  formulate  a  “detectability  hypothesis”  which  will  be  assumed  to  hold  through¬ 
out  the  paper: 


For  any  point  (i,j)  €  {1,  ■ . . ,  s.t. 

(P2)  '  j  and  h(/,)  0  h{Ij)  has  a  non  void  interior, 

Hf  ^  H] 

For  1,...,/,  we  can  consider  a  Kalman  filter  (A'P,),  which  is  the  optimal  filter  for  the 
case  where  : 

/(x)  =r  FiX  -H  fi,  h{x)  -  HiX  -I-  h,;  X  e  K. 

The  Riccati  equation  for  the  conditional  covariance  in  (PP;)  reads  : 
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This  equation  has  for  small  e  a  unique  stai>le  positive  invariant  solution,  equal  to  : 


g  (  f  Fj  \ 


Let  us  define  Ki  —  sign(/fi).  The  optimal  Kalman  filter  associated 

to  the  initial  law  N(E(xo),e^)  is  given  by  : 


dx\  ={Fix\  +  /,)  dt  +  Ki{dyt - Hix\  dt) 

s 


Xq  s:E(xo) 

In  most  of  the  paper,  we  will  concentrate  on  the  case  /  =  2,  in  which  we  will  assume, 
without  loss  of  generality,  that  : 

/i  =  /a  =  =  ^2  =  0i 

h  =  JR_,  I2  =  K+. 

We  will  then  use  the  notations: 

I.  =  h,F.=Fi,H.=Hi 
/+  =  I2,  F+  =  F,,  H+  =  H2 

Let  us  close  this  section  with  three  lemmas.  The  proof  of  the  first  one  is  easy  and  is  left 
to  the  reader. 

Lemma  2.1.  Let  be  i.i.d.  random  variables,  with  joint  law  N(0,6).  Then  for 

any  a  >  0, 

P  (.  |t'»l  >  «)  <  1  -  (1  - 

Consequently,  when  6  —*  0  and  M  —*  00  in  such  a  way  that  MS  =  C, 

P  {  max  \Uk\  > 

Lemma  2.2.  Let  {(ni^  €  W}  be  a  sequence  of  i.i.d.  random  variables.  Let  4(u)  = 
£[exj7(u4i)].  Suppose  that  $  is  finite  on  a  neighbourhood  of  the  origin,  and  that  {u;  $(u)  < 
i;}  is  closed  for  any  k  €  R^.  Call  p  the  common  mean  of  the  (n ’s-  For  any  0  >  0,  there 
exists  C  >0  such  that  for  any  n  €  1^: 


Proof:  This  is  large  deviation  estimate,  which  can  be  found  e.g.  in  Ellis  [2]D 

Lemma  2.3.  Let  {xt,t  >  0}  denote  the  solution  of  (2.1).  Then  for  any  f  >  0,  there  exists 
c>  0  such  that  ^expfcsup^^i <  00 

Proof:  This  is  Theorem  5.7.2  in  Kallianpur  [5]D 
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3.  The  case  of  two  intervals  with  <  0.  First  step. 

We  shall  treat  the  case  >  0,/r_  <  0  (i.e.  h(x)  >  O.Vx  €  H)  and  use  the  notations 
\H\  =  sup(^+,  |F|  =  supdF+l,  |F_|).  Recall  that  assumption  (H2)  is  in  force.  Since 

we  want  to  decide  on  which  side  of  0  xt  is,  we  first  need  to  find  intervals  of  time  on  which 
no  zero  crossing  takes  place,  at  least  with  conditional  probability  almost  one. 

Ut  o<o  <  6,  A/  =  (4=4],  and  for  /  =  0,1,...,  Af-  1,  define: 


Note  that 

Define  moreover  the  events: 


-  ya+(l+i)«  -  yo+lt 

SJ  =  -  /  h(z.)  d, 

K+U 


B+(a,b)  =  (*i  >  0;n  <  1  <  t} 


B^(a,b)  ss  {xt  <  0\a  <  <  <  6). 

In  case  wlicn  there  is  no  ambiguity,  we  shall  sim|>ly  write  5+  and  B. 
Choose  c>  0,  and  define  : 


-  {iVi'l  >  c;  0  <  /  <  Af  -  1) 

Proposition  3.1,  For  any  Sq  >  0,  there  exists  k  s.t.  for  any  e  €  (0,eo)i 

^*((5+ U  B_)7C,)  < 

Pissf.  If  jV;  I  >  c  and|Vf*|  <  c/2,  then  jSIf)  >  c/2,  which  implies  that  there  exists 
<»  €  [a  +  /ff,a  +  (/  +  l)e]  s.t.  |/i(x,,)|  >  c/2  and  |x,,|  >  cj  =  c/2|7f|.  It  follows  that  on 
(B^  U  B^y  nC«,  we  must  have  either  : 


sup 

0<*r$A/-l 


or  else  « 

sup  sup  ki-*»|>ci 

0<k<JVf-l  l)c 

From  lemma  2.1,  Veo  >  0,3co  >  0  s.t.,  Ve  €  (0,eo), 


<  CqC"*"/* 
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Now,  (or  a  +  le  <  8  <t  <  a +  {l  +  l)c, 

1*1  -  *.l  <  cjFK  sup  lx,l)  +  |ii>i  -  tu,| 
«<(<» 


Ck)nsequently, 


{  sup  sup  |*|-*.|>Cl) 

l<M-l  a+l€<s<l^a-t(l+l)€ 


^1 

c  {  sup  |xi|  > 

a<l<*  7£\F\ 


}U{  sup 


l<A/-l  a+lc<<<a4-(l+l)( 


sup  |«>< -«>a+/c|  > 


It  follows  readily  from  lemma  2.3  that  there  exists  cj  s.t.  : 

2^) 

Noting  that  the  sequence  {sup„^i,<,<.^.(i^.j)j  |u;i  —  Wa+ic|;a  </<Af--l)is  i.i.d.,  and 
lliat : 

P  (  sup  |u>,  -  Wa+Jtl  >  Y  )  =  2P(|ti;e|  >  y). 

\a+U<l<a-KH-l)i  4  4 

it  follows  from  an  argument  similar  to  the  proof  of  Lemma  2.1  that  Ve  >  0,  Bcas.t.Ve  <  Coi 
Pl  sup  sup  |tW<  “  U>a+J,|  >  ^ 

a+lc<l<«+(l+l)c  4 


Finally,  note  that  C*  D  {A(xi)  >  2c,a  <.  t  <  b)  D  {supo<|<J^^_l  |Vj*|  <  c},  and  from 
independence. 


F‘(C.)>P(h(x,)>2c,a<f<6)P*(  sup  |Vi‘|  <  c) 


SO  tb&t  liminfc-,0  P*{Ct)  >  Oo 
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4.  The  case  of  two  intervals  with  <  0.  Second  step. 

We  now  want  to  show  that,  once  we  know  that  no  zero  crossing  has  occured  on  [a,  6]  with 
probability  almost  one,  then  we  know  wheter  {xt  >  0}  or  {xt  <  0}  ,  with  a  very  small 
probability  of  error. 

For  that  sake,  let  us  define,  with  the  notations  introduced  in  the  last  section  : 


1 


^‘  =  —a 

g._  1 

*  6  —  a 

On  i?^.. 

I'lVi  -  Yi*  =5f+i 

~  c  . 

/•+(l+l)€ 

/  («'.+.  -  t 

Ja+U 

odd 

E  «'+■  - 

o<i<u-i;i  even 


H+F+ 


*a+(l+l)t 

Ja+h  Jm 


duds 


I  odd 


1 

b  —  a 


I  odd 


E“?  +  d:; 

I  even  » even 

where  we  have  dropped  the  dependence  in  e,  and  defined: 


(u;,+*  -  u;,)  ds  +  V,%i  -  K,* 

a-fic 

H+F+  /*+* 


€ 

Hi 


J. 


a+h 


i: 


duds 


Note  that  ai  ^  Ar(0,2e(l  +  -^)),'and  both  sequences  {oj;/  odd}  and  {m;/  even}  are  i.i.d.. 
Call  Afp  the  number  of  odd  integers  in  the  intervals  [0,  M  —  11,  and  Af*  the  number  of  even 
integers  on  the  same  interval,  and  define  pe  =  Note  that  p„  and  pp  are 

both  close  to  ^  . 

We  are  going  to  show  that : 

Lemma  4.1.  For  any  9  and  eo  >  0,  there  exists  e>  0  s.t.  for  any  s  €  (0,eo), 

p*  ({|z;  -  2p.(i  +  ^)|  >  «)  n 
r  ({|z;  -  2,.(i  +  ^)|  > «}  n  B.)  <«-/«« 
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and  similar  estimates  hold  with  replaced  by  Z*,  po  by  pt. 

Let  us  first  see  the  conclusion  which  can  be  drawn  from  Lemma  4.1.  Suppose  to  fix 
the  ideas  that  >  Hi.  Define: 

ci  =  c.n{z;<,.(2+M)} 

Note  that  C4.  UCl  =  C*,  and  C^.  OCL  =  0. 

Proposition  4.2.  For  any  Cq  >  0,  there  exists  k  s.t. 

P*{Bl/Cr^)  <  ke-^!^ 

and 

P'CBl/Cl)  < 

for  any  e  €  (0,€o). 

Proof:  Let  us  prove  the  first  assertion.  It  suffices  to  estimate  the  quantity 

But: 

P'iBl  n  Ci)  <  ?•((«+  U  B-)‘  n  C.)  +  ^*(8.0  {[Z?  -  2w(i  +  •^)|  >  «)) 

where  0  =  “  Bl).  The  desired  estimate  then  follows  from  Proposition  3.1  and 

Lemma  4.10 

Proof  of  lemma  4.1:  Let  us  prove  the  first  estimate.  We  need  to  estimate  the  following 
three  events  (again  we  drop  the  dependence  on  e  for  notational  convenience): 

G  =  (l 

I  odd 

H  =  {|  E  ft’l  >  ^«). 

I  odd 

.  ^ = {i  E  “'^'1  > 

I  odd 

The  existence  of  c  >  0  s.t.  P*{G)  <  ce~*^*  follows  from  lemma  2.2.  Note  moreover  that 

E  A’  £  *(W+^+)’  (  ‘“P  *?  I  ■ 

‘"“lodd  / 
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Using  Lemma  2.3  and  the  Markov  inequality,  we  then  deduce  the  existence  of  e  >  0  s.t. 
P*{H)  <  Note  that  : 

J  C  <  (  sup  |ai|)(  sup  |r,|)  >  > 

^  {J  odd)  J 

c|  sup  |0f/|  >  ^  „  1 1 U I  ®“P 

Using  Lemma  2.1  and  Lemma  2.3,  we  deduce  : 

P*{H)  <  +  ce-‘^^. 

The  result  now  follows  from  the  three  above  estimates.  □ 

5.  The  case  of  two  intervals  with  H-  <  0.  Tliird  step. 

We  want  now  to  show  how  the  decision  between  {z^  >  0}  and  {z(  <  0}  can  be  made  from 
the  outputs  of  the  two  Kalman  filters  (A'F+)  and  (A'/L).  For  a  <  e  <  d  <  6,  let  us  define 
the  test  statistic: 

L.  =  /  -H.z:)  dy.  -  ^  -  [H.x;  H  ds. 

Using  the  representation  : 

1 . 

djft  =  -A,  ds  +  rfi/; 

A 

where  A,  ss  E*(A(ij)/3^j)  and  (i/f)  -the  innovation-  is  a  standard  Wiener  process.  L*  can 
be  rewritten  in  two  ways  : 

L,  |w+lj  -  H.z;f  ds  +  J  (,H+z*  -H-z-)di^,+ 

+  i  -  »-z:)(k.  -H^zt)  dz 

and  also: 

t, = -  i  d.  +  /*{ff+*j  -  H-»7) 

(5.2)  .  Jd 

+  \j^(H,.z*-H.z:)(h.-H.z:)d. 

Define  C*(a,e)  and  C7i(a,e)  as  in  section  4,  but  with  the  interval  [a,  6]  replaced  by  [a,e]. 
Define  moreover  : 


r  =  inf {t;<  =  a  +  /e;  f  >  e;  |y,+,  -  y<|  <  c}  A  6 
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where  c  is  the  constant  which  is  used  for  the  definition  of  the  event  Ct  • 
We  want  to  estimate  : 


E*  \h»  -  H+x+p  d«;C7;(a,e)j 


as  well  as  the  same  quantity  with  +  replaced  by  — .  We  deconpose  : 

/».  -  /f+x+  =  h.  -  ff+x+  +  ^f+(x+  -  x+) 

where  x^  is  the  conditional  mean  of  x,,  given  y„  in  the  following  filtering  problem  : 

<Ixt  =[/(ar«)l{«<e}  +  f+x,l(,>,j]d/  +  dwf ;  xq  given 
^  dyt  +  dwj*'*’;  yo  =  0 

Define  A,  =  F^x,  -  /(x,),7,  =  ff+x,  -  h{x,), 

aty/t  j  /IV*  ,  /IV*  I  .  /IV*  \ 

A,  dw,  -  -  X]da  +  -  J  y*  dvl  - jf  7?  ds  j  ,  a  <  f  <  6. 

Then  initial  law  on  (D,^),  and  [wf),  {«*’"*’)  are  niutually 

independent  standard  Wiener  process  under  P'*'. 

d/if  —XfZf  dwf  +  •”7i‘^i  dvf ,  f  ^  e;  Zg  —  1 

dyt  =:-/i(xi)di  +  dwf 
iT 

It  follows  from  the  theory  of  filtering  that : 

c 

where  Zf  =  E*(Zt/yt),hf  =  E'*'(h(xt)/yfj,y^  =  E'*'{yt/yt).  Clearly,  if  we  define  x^  = 

E^ixfjFf), 

dZf  =  -Zf{H^xt  -hf)dvl 
€ 

=  «p  (i  -  k)  -  5^  !«+*?  -  A.I’  *) 

But  Zr  —  EP{ZrlyT)-  It  then  follows  from  Jensen’s  inequality  for  conditional  expectations 
and  the  fact  t^at  C^(a,e)  €  3^*: 

E-  y’  \H^i*  -  i,p  *;  Ci(,,  t))  <E’f^  (J’  \F+x,  -  /(..)|»  </,;  C^a,  e)) 

+  E*  (£ !//+!,-  (i(*.)l’  «'•;  c;(«,  cA 

It  now  follows  readily  from  Proposition  4.2  : 
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Lemma  5.1.  For  any  eo  >  0,  there  exists  k  s.t.  Vf  €  (0,eo)i 


E*  |/7+r+  -  ds;CXi<i, «)  0 C7.(a. 6)^  <  ke-^f^ 

and  the  same  result  holds  with  +  replaced  by  —U 

We  need  now  to  estimate  the  difTercnce  \zf  —  <  «  <  6.  Note  that  {xf,i  >  e} 

and  >  e)  are  solutions  of  the  same  linear  filtering  problem  with  different  initial  laws, 

the  second  one  being  non  gaussian.  Let  y*  =  —  ye;c  <  s  <  t},t  >  c.  Since  and 

a(xe)  vy/  are  conditionally  independent  given  ff(xe),  for  <  >  e, 

(5.4)  if  =  E+[E+(xt/x,,yt)/yt]. 

Define  x^,  =  E'^{xt/xt,yf).  x^j  is  the  output  of  a  Kalman  filter.  More  precisely,  we 

ll2tVC* 

dx+,  =/+x+,  dt  +  e"^i2*.i//+(dyi  -  e“*i/+x+,  dt),i  >  c;x+,  =  x. 

=2F+Re.i  +  1  >  e;Re,e  =  0. 

Define  R+(t)  =  Rt,tH^.  It  is  easily  seen  that  3ts.t.  Vt  >  d, 

(5.5)  |/f+-A:+(0l  <*«-*". 

Moreover, 

-  <i)  =(f’+  -  e-‘/<+/f+)(x+  -  x+ )  dt 

(5.6)  +  (A'+  -  /C+(t))(Ji/,  -  r ‘i/+x+,  dl),t  >  e 

x+  -  x+,  =x+  -  x«. 

Since  K+H^.  >  0,  it  follows  from  (5.5), (5. 6)  that  there  exists  k  s.t.  Vs  >  0,  Vt  €  [d,6], 


Finally,  using  (5.4),  we  obtain  that  B*  la/”  —  rfl  <  Jte“^^*,  Ve  >  0  and  for  some  k.  _ 
Since  P*  and  P'*’  coincide  on  a  subset  of  B+(a,b), 


P* 


^tl^ds>0}nC^(a,b) 


) 


<P*(B;(n,5)nC+(a,5)) 

+  P+(1  /*li+-i+l’dl>«) 

€  Ji 


It  then  follows: 
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Lemma  5.2.  For  any  ff,eo  >  0,  there  exists  k  >0  s.t.  Ve  €  (0,eo), 

P*  ((i  I’  *  >  «)  n  Ci(c,  t) j 

FYom  Lemmas  5.1  and  5.2,  and  Ihe  analogues  with  +  replaced  by  — ,  we  deduce: 
Lemma  5.3.  For  any  0,eo  >  0,  there  exists  k  >0  s.t.  Ve  €  (0,eo), 

and 

P*  >  d)  nci(a,6)j  < 

Theorem  5.4.  Vffo  >  0,31:  >  0  s.t.  for  any  e  6  (0,eo)» 

<  0)  nc;(a,6))  < 

and 

0}  nC’l(a,6))  < 

The  proof  of  the  theorem  relics  on  the  following  Lemma: 

Lemma  5.5.  Let  Zt  =  —  /f_a:J';3Q  >  Os.t.Veo  >  0,3ib  s.t.  Ve  €  (0,fo)i 

P*  J  ds  <  <  e“‘^* 

Proof  of  Lemma  5.5  foutlincl:  We  have: 

dzt  =(P+  -  e-^H^K.)z,  dt  +  (P+  -  F..)H.xT  dt 

+  {H+K^  -  //-/<-)[-  ~  dl  +  dt^l 
dz,  =(F+  -  e-‘P+/C+)z,  dl  +  {F+  -  F.)H.xJ  dt 

m  1  /I/  t'  IT  t'  ~  H—Xf  . 


dt  +  rfi/f  ] 


It  follows  from  the  variation  of  constants  formula  that  both  on  C.^.{a,b)  and  on  C-{a,b), 
Zt  is  the  sum  of  three  terms  Z|  =  +  z,*^  +  z[®\  where  zf  is  of  order  y/e^  z^^  is  of  order 

e  and  the  third  one  is  exponentially  small.  The  first  term  is  the  crucial  one,  which  solves: 

dz\''*  =  (P+  -  A’_)z[‘>  dl  +  (^+ A-^  -  H-K.)  dv\ 
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with  initial  data  2*^^  having  the  invariant  distribution.  By  introducing  a  new  time  r  such 
that  er  =  (  — e,  the  required  estimate  reduces  to  a  large  deviations  estimate  for  the  ergodic 
process  in  the  time  scale  r,  see  c.g.  Varadhan  [l2]o 

Proof  of  Theorem  5.4:  Let  us  prove  the  first  estimate  only.  We  now  rewrite  L,  for  the 
case  u  €  b)  as: 

L.  =i  /*  IH+I+  -  ds  +  -  H.t:)  d,/. 

+  i  jf*  *  +  i  d. 

We  first  show  that  the  sum  of  the  last  two  terms  is  nonnegative  with  very  high  probability. 
Indeed,  it  is  bounded  below  by  : 

A-  =  ^  *  -  I  y*  \k  -  d.. 

For  any  $  >  0, 

p-ax  <  o)nci(«.‘))  f|f  y*  I*.  -  ‘I’  > 

It  then  follows  from  lemma  5.3  and  5.5  provided  9  is  chosen  adequately  that; 

P*({.Y<0)f]C;)<c-‘/'^.  for  some  k  and  e  small  enough. 

Let  us  now  consider  the  first  part  of  L,.  Let  us  define  Mt  =  We  need  to 

estimate  the  quantity  P*{Mt  <  —  <  M  >t  /^y/c)-  From  Lemma  5.5,  it  suffices  to  estimate 
P*(i4),  where 

A  =  {Mt<-<M>t  /4y/i]  p|{<  M  >,>  a) 

Using  the  facts  that  E*[exp{XMt  —  (A*/2)  <  M  >f)]  =  1  and  on  i4,  if  A  <  0: 

.  XM,  -  (AV2)  <  M  >t>  [-X/4y/€-X^/2]  <  M  >,>  [-X/4y/€  -  X^/2]a 
Now  chosing  A  adequately  we  have  that  for  any  Jb  >  0  and  e  small  enough, 

I 

The  proof  is  compIeteO 

Therefore,  knowing  that  we  are  on  Cc,  L(  is  a  good  test  statistic  to  decide  whether 
we  are  on  or  on  Ci(a,6),  i.e.,  essentially  whether  {xi  >  0,a  <  t  <  5}  or  {xi  < 

0;a  <  t  <  6}.  Note  that  Lemma  5.3  proves  that  zf  (resp  x^)  is  then  a  good  estimate  of 
^  b.  It  follows  moreover  from  the  above  that  the  variance  of  the  conditional  law 
is  of  the  order  of  e. 
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6.  Summary  of  the  procedure  in  the  first  case^  and  the  general  case. 

Let  us  first  summarize  the  procedure  in  the  case  studied  so  far  of  two  intervals  with 

H^H.  <  0. 

1-  At  each  time  Jbff,  k  6  JV,  we  compute  —  y*t],  and  check  whether  or  not 

its  absolute  value  exceeds  a  given  quantity  e. 

2-  As  soon  as  the  first  test  is  positive  over  a  certain  time-interval,  we  start  running 
the  Lc-test,  possibly  in  a  sequential  way  . 

3-  As  soon  as  we  have  an  answer  from  the  L'c-test,  we  follow  the  corresponding  Kalman 
filter  (one  might  continue  to  run  the  L(-test,  in  order  to  correct  a  possible  wrong  decision). 

Note  that  before  we  get  any  answer  from  the  tests,  the  estimate  of  xt  is  zero.  During 
a  given  time  interval,  this  is  not  a  very  good  estimate.  But  that  situation  seems  to  be 
inevitable.  Indeed,  numerical  results  [3]  indicate  that,  just  after  Xt  has  crossed  zero,  the 
conditional  density  has  two  peaks  on  both  sides  of  zero,  and  it  takes  some  time  before  one 
of  the  peaks  disappears. 

The  reason  why  we  do  not  use  the  results  of  section  A  in  order  to  build  a  test  for  the 
choice  between  {xi  >  0}  and  {xi  <  0}  is  that  a  test  based  on  the  approximation  of  the 
quadratic  variation  of  an  approximate  derivative  of  yi  would  not  be  very  robust.  Similarly 
one  might  wish  to  replace  the  test  based  on  the  values  of  £~Hz/(fc+i)c  ~  Vht)  for  several 
consecutive  ib’s  by  a  test  using  the  outputs  of  the  Kalman  filters.  Indeed,  on  can  show  that 
the  difference  h(X|)  —  H^xf  is  always  at  most  of  the  order  of  \/i.  Unfortunately,  due  to 
the  presence  of  the  local  time  term  in  the  expression  for  h(x|),  we  were  not  able  to  get  a 
good  enough  estimate  for  the  probability  of  error  associated  to  sucli  a  test. 

Let  us  now  discuss  briefly  the  case  where  >  0,  and  the  general  situation.  In 

the  case  of  two  intervals,  jR_  and  with  >  0,  h(x)  is  one  to  one,  and  clearly  no 

test  is  needed  to  decide  where  is  X|.  That  decision  is  in  that  case  obvious  from  the  values 
of  xf  and  xj" .  One  might  also  in  this  case  invoqiie  the  result  of  Picard  [8]  . 

Finally,  in  the  general  situation,  we  have  to  detect  each  crossing  by  X(  of  the  local 
maxima  or  minima  of  the  function  h,  and  choose  among  the  several  Kalman  Alters  which 
one  to  follow.  One  can  either  construct  L(-type  tests  only  between  adjacent  intervals,  or 
else  betwen  any  pair  of  two  distinct  intervals,  depending  on  the  confidence  one  has  in  the 
decisions  previously  taken.  The  latter  obviously  depends  on  the  lengths  of  the  various 
intervals,  as  well  as  the  difference  between  the  values  of  adjacent  |/fi|’s.  Let  us  Anally 
remark  that  for  the  case  of  two  intervals  R-  and  R^,  the  problem  can  still  be  solved  if 
H+  =  — if-,  provided  F+  F_.  However  the  technique  is  different,  and  we  do  not  present 
it  here. 
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1  Filtrage  aproch4  pour  le  probl^me  non  llii^aire 
discr4tis4,  avec  petit  bruit  d ’observation 

1.1  Introduction 

On  consid^re  le  probl^me  suivant  : 

On  a  un  signal  X.  solution  de  I’EDS 

<LX  =  b{X)dt  +  £''a{X)dw^  ,  X(0)  =  $  (l) 

et  on  dispose  de  I’observation  v4rifiant 

dY  =  h{X)dt  +  £^-^dw^,Y{0)  =  0  (2) 

oil  w^,w^  sont  des  processus  de  Wiener  standard  ind4pendants  et  ^  est  une 
v.a.  independante  de  to*  et  tu*.  Le  parametre  e  est  suppose  petit  et  0  <  -7  <  |. 

On  consid^re  la  discretisation  la  plus  simple  de  ces  equations  : 

Xk+i  =  Xk  +  b{Xk)^t  +  £''<t{Xic)V At  Wk+i ,  Xo  =  ^  (3) 

y*  =  h{Xk)  +  -j=—ibk  (4) 

oil  Xk  est  une  approximation  de  Xt^  (tk  =  kAt),  Wk  et  tZi*  sont  des  bruits 
blancs  gaussiens  standard  independants  et  ^  une  v.a.  independante  de  Wk  et 

Wk. 

Soit  {.Sr*}  le  filtre  optimal  pour  le  problfeme  discret,  i.e., 

Xk  =  ElXkM  ,  Y^  =  a(y,;  i  =  0, 1,  •  •  • ,  k). 

Puisque,  dans  le  cas  general,  la  determination  de  Xk  presente  une  grande 
complexite,  on  aimerait  pouvoir  construire  une  “bonne  approximation”  de 
ce  filtre  representant  un  compromis  entre  les  couts  en  temps  de  calcul  et  la 
precision  des  resultats.  Soit  {M*}  une  telle  approximation,  4  preciser  plus 
loin,  au  cours  de  ce  travail.  On  est  interese  par  la  vitesse  de  convergence  de 
M*  vers  .ST*  quand  e  devient  “petit”,  Quand  At  — >  0  on  doit  pouvoir  approcher 
le  probieme  de  filtrage  en  temps  continu  (1  -  2). 
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1.2  Construction  des  Filtres  Approch4s  (cas  lin4aire) 

1.2.1  £tude  de  la  vitesse  de  convergence  de  I’erreur  quadratique 
moyenne  pour  quelques  filtres  propo848 


On  supposera,  dans  la  suite,  que  'y  =  0  et  on  se  situe  dans  le  cas  liniaire. 
On  consid^re  le  syst^me 

f  ^k+i  =  (1  +  bAt)Xk  +  <T\/ At  iVk+u  ^0  =  i 


y*  =  hXk  + 


y/At 


yo  =  0 


(5) 


4tant  maintenant,  une  v.a.  gaussienne. 

On  rappelle  le  fait  bien  connu  de  que,  dans  le  cas  lin4aire,  I’estimation 
optimale  est  donn^e  par  des  Equations  de  dimension  finie,  les  Equations  du 
filtre  de  Kalman  : 

^k  —  (l  +  + -j5 — - (y*  -  ^(1  + 

Zi  +  fi^Pkik-i 

=  (1  -f  bAt):kk.i  +  ^  ^~k^tpk]k  I  ~  ^  bAt)Xk.i)  .  (6) 

2  a*  + 


Pfc+i)*  =  (1  +  bAt)  pfc|*_i  +  a^At  - 


3^  +  b.^Pk\k-i 


{l  +  bAtr^^Pk^k.i 


+  a*  At 


(1  +  bAt)^e^pk\k-i  ^  ^2 

s*  +  h^Atpk\k-i 


+  <7  At 


(7) 


En  outre, 


Pk  =  Pfcl*_l 


Pfc|fc-i^^ 

^  +  h^Pk\k-i 


a7P*I*-i 
S  +  h^Pk\k-y 
g^Pfc|fc-i 

e*  +  h‘^Atpk\k-i  ’ 


(8) 


avec  les  notations: 

A 


Pk  =  E[{Xk  -  ^kf]  ,  Pkik-i  =  EliXk  -  et  ^kik-i  =  E\Xk\Yt% 

Notre  but  est,  ne  I’oublions  pas,  de  construire  un  processus  {M*}  qui  ap- 
proche  {X*}. 
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A)  On  commence  par  determiner  la  covariance  de  I’erreur  de  provision  Pfc|fc_i 
dans  une  situation  stationnaire,  ce  qui  est  Equivalent  E.  calculer  la  valeur  sta- 
tionnaire  de  la  covariance  de  I’erreur  d’  estimation  p*  (notEe  p^),  puisqu’on 
passe  d’une  k  I’autre  par  I’expression  (8). 


Soit  p,  la  valeur  stationnaire  de  Pk\k-i- 

(1  +  tAoy,.  , 

e^p,  +  h^Atp^  =  (1  +  bAt)^e^p,  +  c^e^At  +  a^k^At^p, 
h^Atp^  +  [e*  -  (1  +  6At)*e*  -  <7*/i*At*)p,  -  tr^e^At  =  0 
-  [e*  -  (1  +  JAi)*5*  -  <y®A*At*]  +  r(s,  At) 

^  2h^At  ’ 


oil  r(e,  At)  =  [(e*  -  (1  +  6At)*£*  -  +  Aa^hh^At^]  ^ 

=  At  [{(26  +  6®At)E*  +  a^h^Aif  +  Aa^h^e^]’ 

=  Afp(e,At), 


i.e. 


At[(26  +  6*At)e*  +  a^h'^At]  +  Atp{e,  At) 

p,  _  __ 

(26  +  6*At)e*  +  a^h^At  +  p(e,  At) 


2A* 


Le  gain  stationnaire  sera  done, 


hp,At 


hAt  . 
— 


e*  +  h^Atp,  e 

On  notera  0k  ie  gain  k  I’instant  tj^  ; 

A  hAtpk\k-i  _  ^At 

*  ~  e*  +  A*Atp*|*_i  “  e* 


Pk- 


B)  On  considEre  un  schEma  qui  rEsulte  de  (6)  en  rempla;ant  0k  pa.r  sa  valeur 
stationnaire  0,  ou,  d’une  fagon  plus  gEnErale,  par  une  approximation  de  cette 
valeur.  Dans  la  suite,  d  dEsignera  done  soit  0,  soit  une  approximation  de  0,, 
selon  le  cas  explicitE. 

On  considEre  alors  le  processus  {M*}  donnE  par  I’expression 

Mk+i  =  (1  +  bAt)Mk  +  ^(Vfc+i  —  ^(1  +  6At)Afjt).  (9) 

On  obtient  ainsi. 
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I 


i) 

■^fc+i  —  Mic+i  =  (1  +  bAt)Xk  +  0k-«-i(yk+i  ~  ^(l  +  bAt)Xk)  —  (l  +  bAt)Mk 
~^iyk+i  —  ^(l  +  bAt)Mk) 

=  {I  +  bAt){Xk  -  Mk)  +  (0k+i  ~  ^)yk+i 
-h(\  +  bAt){$k^kXk-iMk) 

6tant  donn^  que  $k+i  =  S  +  (tfk+i  —  i), 

Xk+i  —  Mk+i  —  (l  +  iAf)(l  —  h9)(Xk  —  M^k) 

+(^*+1  -  ^)(y*+i  -  ^(1  +  bAt)Xk), 
oil  y*+i  —  A(l  +  bAt)Xk  =  Vk  est  I’innovation  : 

Ev\  =  h^Pk+i\k  +  —  • 

Soit  ffk  =  $k  —  i- 

E[(Xk+i  -  Mk^iY]  =  (1  +  6Af)®(l  -  -  Mkf] 

+J?*+i(AV*+i|*  +  — ],  (10) 

et 

ii) 

-X’fc+i  -  Mk+i  =  (1  +  bAt)Xk  +  ay/^Wk+i  -  (1  +  bAt)Mk 
-^(yfc+i  -  Ml  +  bAt)Mk) 

=  (1  +  6At)(l  -  li5)(A-k  -  Af*)  +  ay/^twk^i 

-d{ha\/Aiwk+i  + 

puiaque  y*+i  =  hXk+i  + 

=  A(l  +  bAt)Xk  +  ha-</AtWk+i  H — ^=®fc+i 

V  At 

=  (1  +  6At)(l  -  hd){Xk  -  Mk)  +  <tV^(1  -  hd)wk+i 

D’oii 

£((^■*+1  -  Affc+i)’l  =  (1  +  6Af)*(l  -  hd)^E[{Xk  -  Mfc)*]  +  <7*At(l  -  h9)^ 


(11) 


iii)  Soit  Til,  =  Ok  -  0,  et  /t*  =  Pk\k-i  -  P,- 

Vk+i  =  Ok+i  -  0, 

_  hAtpk+nk _ hp,At 

~  ff*  +  h^Atpk+nk  ff*  +  h^Atps 

_  feAf[g^Pfc+in  +  h^Atpk+i\kP,  -  g^P»  -  fi^^tpk+nkP,] 
~  [g*  +  h^Atpk+i\k]  [c’  +  h?Atp,] 

hAte^ 

~  [g2  + /i*Afpfc+i|fc][g*  + 

et 

Mfc+i  =  Plk+l|*  -  P. 

_  (1  +  bAtye^pk\k-i  _  (1  +  bAt)^e^p. 
g*  +  fi^Aipkik-i  e*  +  h^Atp, 

_ +  bAtys^ 

[g2  +  A2Atp*|fc_i|[g*  +  k^Atp,] 

_  (1  +  bAt)^£* 

[g*  +  A*Atpfc]*_iI(g*  +  A^Atp.j^* 

_  (1  +  6At)*g* 

[g*  +  h^Atpk  +  fe*Afp,][g*  +  /i*Atp,]^* 

Co 

=  - P-k 

ClPk  +  C2 

oil 

Co  =  (1  +  6At)*g^ 

C|  =  h^At{e^  +  h^Atp,) 
cj  =  (g^  +  h^AtptY 


(12) 


(13) 


Done, 


Co 

/i*+l  =  — 


C,  +  ^ 
Cl 


Suivant  un  raisonnement  par  recurrence  on  trouve  I’expression: 

Cq  C2 

Pk  = - jrj - Po  =  (— ) - - Po 

ClPo  53  Ca'‘'‘4  +  4  *  53 (— )‘  + 

IsO  tsO 
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D ’autre  part, 

ffk  =  {8k-0,)  +  {0,-d) 

=  Vk+ 

C)  Soit  At  =  g°  ,  a  >  0.  Notre  but  est  de  faire  une  discussion  de  la  vitesse 
de  convergence  du  schema  (9)  pour  les  diff^rentes  valeurs  de  a. 

On  rappelle  que  : 

_  (26  +  6*At)e*  +  a^h^At  +  Pa(c,  At) 

- - 2^5 


Done 


Pa(e)  =  [((26  +  6*At)e*  +  a^h^At]^  + 
p.  =  0  (e“  V  s) 


P,  >  c(g°  V  e)  . 

Alors,  en  supposant  que  /to  >  0  (i.e.  la  valeur  initial  de  la  covariance,  po> 
est  une  constante  ind^pendante  de  c),  de  (13)  vient  que  : 

Pk  <  (— )*/to, 

Cj 


Co  _ 

(1  +  66“)  V 

C2 

[e*  + 

<11 

j  Co  _  [s’  4 
C2 

puisque 


C2  [c*  + 

e'^lh^p,  -  6g^l[g»  +  +  (1  +  6g°)e»] 

[e*  +  /i’e“p,)*  ^ 

D’autre  part,  les  formules  (10)  et  (11)  donnent  : 


6 


E[{X,,^1  -  =  (1  -  A)£;[(i:*  -  M*)*]  +  (14) 


oil 

k+l 

1=0 

1  -  A 

=  (1  +  60^(1  - 

Bk+i 

A  o  e*  +  /i*e“p*+iifc 

-  '^*+1 

# 

Bo 

^  E[{Xo-Mo)% 

=  {l-A)E[{X,-M,y]  +  D 

k+l 

=  (1  -  A)*+*^[(Xo  -  Mo)*]  4-  x:(l  -  A)'‘^'-'D 

i=l 

=  (l-A)‘+‘E[(Xo-Mo)*|  +  r>E(l-^)' 

*•=0 

Si  A  >  0, 

£;[(X*+1  -  =  (1  -  A)‘^‘£;[(Xo  -  Mo)*]  +  ^  (15) 

A 

oil 

D  t  a*£r“(l  -  hif  +  ^r. 


I.  Supposons  d’abord,  pour  simplifier,  qu’on  prend  comme  approximation 
du  gain  0^  le  gain  stationnaire  9,,  i.e.  i  =  9,  (voir  le  schema  (9)).  On  d^duira 
par  la  suite  les  r^sultats  pour  d’autres  approximations  9  k  pr^ciser. 

Dans  ce  cas  1^  : 


i) 

1-A  =  (1  +  6e“)*(l  -  W,)* 

(1  +  6g°)*g< 

(ff*  +  A’ff®p,)* 


7 


=  ^^l-A 

C2 

_  2  +  h^e-^Pi+ni 

^i+l  —  ’?t+l 


/ic“+* 


rMi+i 


[(£*  +  h^e^pi+nille^  +  fe*c“p.] 

_ _  2 

[e*  +  A*g“p,>ii,l[e*  + 


g®  4-  A*g“pi+i|i 


et,  puisque  pi+i|,  >  p,, 
fc*g“+‘ 


Bi+i  < 


[g2  +  /i*C“Pj] 


ic?«  =  fl;*;*. .  »!■  B 


[g2  +  fe*g“p.j»  ■ 


et 


Mil  =  [(1  -  A)*1‘^H.>iM*  ,  ou  ffi+i  = 


1  2 


C2 


[ciMoEj=o(^V  + 

et  Ho  est  tel  que  E[(Xo  -  -Mo)*)  =  BHopl- 

Reprenant  I’expression  (14), 

£;((Jf*+i  -  Mfc+i)*]  <  (1-A)£?[{^*-M*)*]  +  B[(1-A)Y^^^*+iM^ 

*+l 

k+l 

E 

»=0 


<=o 


=  nlB{l  -  A)"*'  E(1  - 


(16) 


Majoration  de  la  s4rie  : 

^•=0  '  k 


Hi+i  = 


cj 


Cl  Mo 


1  - 

I  —  SSL 
ej 


+  Cj 


^  1 

avec  —  <1 
C2 


c,(l  - 

_ 52_ 


C.l^ll  -  (^)"‘l  +  C2(l  -  7) 

Cj  C2 
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«2(1  - 


CiUo  +  C2{1  -  — )  -  CiIIq{—) 

L  Cj  C2 

CiMo  Cj 

—  ff2_Z_f2.\* _ ^ _ 

^  ClfiO  '  1(1  ^  fLUf®)  _  (£°)‘+1]2 

CiflQ  C2 


^0\i+l 


=  a 


(u  -  r<+i)» 


ou 


Avec  ces  notations, 


A  C2  ~  Co 
O  =  - 

Cl  Mo 

A  ,  C2  -  Co 

tt  =  1  H - 

CiMo 

A  Co 

r  =  —  <  1 
C2 


*+l  *+l  _*• 


On  commence  par  calculer  les  ordres  de  grandeur  de  a  et  de  u. 


a  = 


-  (g"  -I-  -  (1  +  ^c°)  V 


(17) 

(18) 


Soit  Nnm  le  num^rateur  de  a. 

Num  =  (e*  +  AVp,)*  -  (1  +  60*^^ 

=  (e*  +  A’e^p.  -  (1  +  +  A*e"p.  +  (1  + 

= 


•  Pour  Qt  >  1,  p,  -  =  ^(c^)>  ce  qui  entraine  ^ttm  =  et  done 

hr 


=  0(e) 


1 


Quant  4  tt,  on  voit  rapidement  que 


«  =  1  + 


(p»  -  +  (1  +  *«■“)«■*! 


=  1  +  0  (c)  et  u  >  1. 

On  peut  faire  la  majoration  suivante  : 

Jk+l  _«■  00  _» 

V — I _  <  y — - — 

.it  (“  -  ^’Y 
»■* 

<  /  7 - ^dx 

Jo  («  —  r*)* 

-  jiL_L 

log  r  ua 


Done 


*+^  —1  1 

<  a^- - 

.=0  log  »■  ““ 


+  Ho 


-1  1  rr 

=  aj - +  Ho 

logr  u 


L’expression  (16)  vient  alors, 


-1  1 


E{{X,^,  -  <  4B{1  -  - +  Bo) 


log  r  u 


Etant  donn4  que 


4*e“+<  (p,  —  +  K^e'^p,  +  (1  +  be°)e^ 


Ba  _  e*  4-  h?e'*p. 


Ho{e^  +  h'^e’^p,) 


1  + 


(p«  -  ^g^)[g^  H-  h^s^p,  +  (1  +  6g°)g’^] 

^  _ 4^£°'^*(p,  -  ^g^)[g^  +  ^^g^P,  +  (1  +  6£°)g^] _ 

[g*  +  4*g<*p,]*[Ato(e’  +  ^*c“P#)  +  (p#  -  +  (1  + 

=  0(g‘-^). 


et 


-1  1 

- <  - 

log  r  1  -  r 
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on  obtient  : 


Ba 


<  e  et  BHo  =  -^E[{Xq  - 
-logr  Mo 

Done,  si  a  >  1, 

E[{Xk+i  -  Aifc+i)*)  <  cexp{-c(A:  + 
et,  puisque  =  (fc  +  l)^®. 


(19) 


■£^[(-^*+1  -  Af*+i)*j  <  cexp{-c<t+iA} 


•  Pour  a  <  1,  reprenant  la  formule  (14),  on  obtient: 

*+i  - 

.=0  «2 
At+1 


avec 


<  D^(— )*'‘^^  ‘(— )^’m^  puisque  m.  <  (— )'Mo 

,_0  C2  Cj  C2 

car  Hi  <  1 

-  *+i  , 

<=2  ,=0 


B 

A 


[52  + 


e'*[h^p,  -  6e*][c®  +  h^e^p,  +  (1  +  6e“)e*] 


(e*  +  h}e'‘p,Y 
hh* 


[c*  4-  h^€’*p,\[h?p,  -  6e*][E*  +  h^e^p,  +  (l  +  6e“)e2] 


=  0{e*  *“)  ,  puisque  h^p,  —  6s*  >  c(e“)  . 

Done,  si  o  <  1, 

i.e. 

E\{Xk+i  -  Mk+i)^]  <  cexp{[4(^^  +  1)(1  -  a)  -  o]logs} 


ii)  Quant  k  I’^cart  par  rapport  au  signal  {X^},  puisque 

n  _  12  I  ]2 

4-  k^e“pt  £“  e*  +  h^e^p. 


[e2  +  /i*e«p.]» 


kv  +  fc*p;i, 


A  ff"{/i’p,  -  +  (1  + 

[h^p,  -  6e^][e*  +  h^e'^p,  +  (1  + 


_  f  0(e)  ,  a  >  1 
1  0(e2-“)  ,  a  <  1 


et  done 


si  a  >  1 


-  Mk+i)^\  <  exp{-ctifc^ii}El(Xo  -  Mof]  +  ce|  (20) 

•  si  a  <  1  , 

I  E\{Xh^i  -  M,.,i)^)  <  exp{-ct,^i^}E[(Xo  -  +'ce^-°|  .  (21) 

Remarque  1.1  On  peut  constater  que,  quoique  soit  a  >  0,  les  valeurs  sta- 
tionnaires  de  E[(Xk+i  -  Mk+i)^]  et  E[[Xk+i  —  sont  de  meme  ordre  de 

grandeur  done  ^utilisation  du  filtre  approehi  (voir  le  seh^ma  (9))  est  justifi^e. 


II.  On  eherehe  maintenant  les  expressions  plus  g6n4rales  qu’on  obtient  quand 
on  utilise  une  approximation  i  du  gain  asymptotique  0,. 

Pour  la  construction  de  cette  approximation  on  peut  proc4der  de  deux 
manidres: 

II.l.  On  utilise  un  d4veloppement  limits  de  pa{^)  pour  construire  une  ap¬ 
proximation  p  de  p,  et  on  obtient  done  une  approximation  0  de  0,,  par  0  = 
hpE‘‘ 

e*  -I-  h^e<^p  ' 

Supposons  que  p,  —  p  =  0  (e”*)  ,  m  >  1,  i.e.  p  est  une  approximation  de  p, 
d ’ordre  s'". 

Alors, 


ILl.i) 


EKXfc+i  -  Mfc+i)’)  <  E(1  -  , 


oil,  on  rappelle, 


_  (1  + 
[e*  + 


B.  =  »?.? 


+  fe^g**Pi|.-i 


_ fe»g°-^* _  , 

(g*  +  /i’e“p,i.-i][e*  +  fc*g“p]*^* 

_ h^e^+* _  2 

“  [g*  +  fc*g“p,][g*  +  fc*g“pP^* 

=  ,  £tant  B  =  7— - -7 - 7,  — 

’  r«-2  J.  )>2<-a«  Uc2  - 


(g*  +  ;i*g“p,][g*  +  fe*g“p]*  ■ 


Done 


•  sO 

k+l  fc-t-l 

<  2S  53(1  -  A)*+*‘V?  +  2cBg”"  53(1  -  A)*+*-’ 

«=0  1=0 

fc-fl  k-fl 

=  2B  53(1  -  A)*+‘-m.-  +  cBg*"*  53(1  -  A)’ 

»=0  t=0 

fc+1  B 

<  2B  53  (1  -  A)*-*^'-’/!?  +  cg’"‘-5- 

<=0 

•  Pour  a  >  1,  en  utilisant  les  majorations  dans  Ic  paragraphe  C)I.,  on 
obtient: 

k+l 

f;[(x»+x  -  Affc+x)*!  <  2BE(i-^r'~’((i->^)*r^.M?  +  ce* 

t=0 

*+*  (\  -  A\* 

=  2Bm?(i  -  a)*+^  e  [h--^rg>- + 

tsO  ^ 

(a)  si  p,  -  p  >  0  ,  alors  1  —  A  >  1  —  A  done 


B((ir»+x  -  M*+xn  <  2Bm»(1-A)‘+»E(1-^)‘J?.  +  ‘=^’ 
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to  ^  u,..!  to 


I 

1 


<  25MS(l-A)*+Mar^-  +  ^o) +  «*'"! 

logru  A 

oii  a,  u  et  r  sont  donnas  par  (17). 

(b)  si  p,  —  p  <  0  ,  alors  1  —  A  <  1  —  A  done 

k+l  B 

<=0  ^ 

<  25p*(l-A)‘+^(a^i  +  ^o) + 

logru  A 


•  Pour  a  <  1  , 


•’.n  1  A  Ji. 


<  2m’(1-A) 


3\fc+i 


5 


(1  -  A)* 


2m  5 

+ 


1  - 


1  -  A 


Quelques  calculs  rapides  nous  donnent: 


_ _ 

[e*  +  /i*e“p,)(e’  +  fc®e“p]* 

(P.  -  + '^Vp.  +  (1  +  6s  V] 

+  h^e^P,)  +  (p,  -  +  (1  + 

0(e““‘)  ,  si  a  >  1 


(1  -  A)» 
1  -  A 


5(1 -A) 

(1  _  A)  -  (1  -  A)» 

_ _ (1  +  frg°)V 

[g^  +  ^^g**p,]|g^  4-  fe^g°p]^  [g^  +  fc^g°p]^ 

(1  +  be^ye*  (1  -f  fcg°)*g» 

(g*  +  ^*g®p]*  [g*  +  fc*g®p,)< 

_ h^e“+* _ 

_ (g^  +  ^^g”p«][g*  +  fc*g**p]^ _ 

[g»  +  hU^p,]*  -  (1  -t-  6g")«g*[g»  +  fc»g°pl» 

[g*  +  ^*g“p,]* 


_ k^£<*+* _ 

[c*  4-  +  c®(l  4-  6e“)(ff®  4-  A®e“p)| 

_ [g^  4-  _ 

(g*  4-  fe^g*p,)’  —  g*(l  4-  bg“)(g*  4-  fc’g“p)  ’ 

oil  (g*  4-  fc’g"p.)’  -  e*(l  +  +  ^’e“P) 

=  /i’g“+*{2p.  -  p)  4-  /i‘g*"pj  -  6g“+*  -  ^*g*“+*p 
>  cg<" 

=  0(g^“^“)  ,  si  a  <  1  . 


(1  4-  6g“)’g< 

^  [g*  4-  /i*g“pl* 

[g^  4-  At^g°p]^  -  (1  4-  tg°)^g* 

(g*  4-  h?eop]^ 

(g*  4-  /i*g“P  -  (1  4-  6g")g*)lg»  +  /i^g-p  4-  (1  4-  6g“)g*] 
(g2  4-  ;i*g“p]* 

g“(fc’p  -  6g*][g*  4-  fe*g“P4-  (1  4-  6g“)g*] 

(g*  4-  Mg“pl’  ’ 

oil  A*p  —  be^  <  e(g“  V  e) 


s 

A 


0(g“~*),  a  >  1 

0(1),  o<l 


[g*  4-  /i*g“p,](g*  4-  A*g“p]* 
g“(fe^p  -  6g^|[g»  4-  h^£“P  4-  (1  4-  &g°)g^] 

[g*  4-  A*g®p]* 

_ fcV _ 

[g*  4-  /i’g“p,|(A*p  —  6g*](g*  4-  fc*g“p  4-  (l  4-  fcg“)g*l 


0(i),  a>l 

0(g"-®“),  a<l 


On  obtient  done  les  estimations  suivantes: 

•  si  a  >  1  , 

E((^fc+i  -  <  cexp{-ct»+ii}  4-  eg*”-* 


(22) 
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•  si  O!  <  1  ,  puisque  1  —  A  < 


£[(^*+1  -  Mk+iY]  <  cexp{(4(^^  +  1)(1  -  a)  -  o]  logs}  + 


Il.l.ii)  D’autre  part, 

<  (1  -  -  Mo)*]  +  ~ 

A 

oil 

+  h^p^] 

D  [e’  +  K^eopY 

A  ~  -  5e*](e*  +  /i*£“P  +  (1  +  6£“)e*] 

[e*  +  A2ff“p]2 

_  e*(<r*f*  +  /i*p*] 

[h^p  -  6c’](e’  +  ^^e“p+  (1  +  6e®)e*| 

f  0(e)  ,  a>  1 

1  0(e*-“),  a<  1  ’ 

done 

•  si  o  >  1  , 


E[(X*+i  -  Mfc+O’I  <  exp{-ct*+»i}E((Xo  -  Mo)*]  +  cf 


•  si  a  <  1  , 


-  M*+i)’]  <  exp{-ct*+,^}E[(;»ro  -  Mo)*]  +  ce*-“ 


C’est  en  fait  le  m€me  r^sultat  qui  a  4t4  obtenu  dans  le  cas  de  I’utilisation  du 
gain  stationnaire. 

On  explicitera  maintenant  quolques  expressions  possibles  pour  I’approxi- 
mation  du  gain,  9  (ou  de  la  variance,  p): 

Supposons  que  A  >  0  et  <t  >  0. 


(a)  si  a  >  2,  le  d^veloppement  de  Taylor  de  Pa  (c)  nous  donne,  apris  quelques 
calculs, 

Pa{e)  =  2hae  +  O(e’) 


et  done 


Si  on  prend,  par  exemple, 

,  i.e.  p,  -  p  =  0  (e*)  ,  alors 


1. 


El(Xk  -  M*)*]  <  cexp{-ct*-}  +  ce* 


2. 


SI 


i  a  7^:  2  ,i.e.  p,  —  P  =  0  (ff“  V  e*)  ,  alors 


E[(^t  -  Mkf\  <  cexp{-cefci}  +  c(e*“-‘  V  e^) 


3. 


(24) 


si  2  <  a  <  3  p,  ~  p  =  0  (e*)  ,  alors 


E[(^k  -  A^*)’I  <  cexp{-c<*i}  +  ee^  . 

(b)  si  1  <  o  <  2,  du  d4veloppement  limits  de  Po(e)  results: 

p.  =  +  ye"  +  0  (e*  V 

done  on  peut  prendre  eomme  approximation  de  p,  par  exemple: 
,  i.e.,  p,  —  P  =  0  (e®)  et  alors 


1. 


£;[(J%^fc  -  Affc)*]  <  cexp{-ct*-}  +  ce*"  ‘ 


(25) 


2. 


p  =  f  e  + 


,  i.e.,  p,  —  p  :=  0  (c*  V  e*“  ^)  et  alors 


E[{SCi,  -  Affc)’)  <  cexp{-ctfci}  +  c(e*  V  e*®"*) 
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(c)  si  a  =  1,  le  ddveloppement  limits  de  Pai^)  est  un  peu  particulier: 

Pa{e)  =  chy/4  +  a^e  +  +  0(ff*)  , 

ce  qui  entraine: 

,a  I  <r*.  ,6  ba  , 

=  ^hV  —  ~~r~  • 

2fe^l  +  — 

+  Oie>). 

On  propose,  par  exemple,  les  approximations  suivantes: 

1. 


P  =  (iv/l  +  +  ^)e  I  i.c-  P$-P  =  0  (e*)  et  alors 


E[{Xk  -  A/fc)*j  <  cexp{-ct*i}  +  ce* 


2. 


p  =  (f  n/1  +  ^  +  ^i)e  +  i'-  y  \  i.e.  p.  -  p  = 

et  alors 


E\iXk  -  Mkf\  <  cexp{-ct*-}  +  ce*  . 
(d)  si  0  <  a  <  1,  le  d^veloppement  de  la  fonction  Pa{s)  devieqt 

=  aVe“  +  OCe*-) 

d’oi 

p,  =  o’e®  +  0(e*~®)  . 


Done,  si  on  prend  p  =  o’e®  ,  i.e.  p,  —  P  =  0  (e*~®),  alors 

E[(Xk  -  MkY\  <  cexp{[4(^^  +  l)(l  -  a)  -  a]  loge]}  +  ce' 


.8-7 


0(e*) 
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II.2.  On  essaie  maintenant  de  d4crire  une  approximation  de  0,  sans  passer 
par  le  d^veloppement  de 

=  - - 

i  .  in  a (2*  +  6®c“)£*  +  a^h^e“  +  Pa(e) 

'  +*' - W’ - 

_  e“[{2b  +  +  a^h'^e“  +  Pa(g)l 

h[2£^  +  e“[{26  +  6*e*)5*  +  +  Pa(£)]] 

J. _ 1 _ 

h  2c* 

c“[(26  +  &*c®)e*  +  o^K^e°‘  +  Pa(e)] 

1  1 
A  1  +  Dq 

£  _ 2£^ _ 

c“((26  +  6®e'*)c*  +  a^h^e‘*  +  Po(e)] 

_ 2e*-° _ 

(26  +  6*c®)e*  +  cr^h^e^  4-  Pa(c) 

>  c(c“  V  c)  . 


oil  Dq 


et  pa(e) 


II.2.1  Si  o  >  1, 

n  ~2— f! _ L_ 

®  ~  ^c“2A<7c  Aae“-1  ’ 

puisque  Pa(ff)  =  2A<7ff  +  0  (c’  V  c*®"*). 

Une  approximation  de  0,  est,  par  exemple. 


=  5; 


1  + 


has 


a^l 


1  1 
h  1 


=  <T€ 


A<7e®“^ 
a— i 


On  prend  done  _ 

d  =  <rc®-* 
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Puisque 


hpe^ 

+  k^e<*p  ’ 

i.e.,  pour  ?  7^ 

de^ _ g 

^  fcc*(l  —  hd)  A.e“~*(l  —  Ag)  ’ 
ce  qui  donne,  dans  notre  cas, 

_  ae 

^  /i£““*(l  —  kae^~^)  /i{l  —  ahe‘‘~^)  ’ 


on  obtient,  en  utilisant  le  d^veloppement  limits  de 


J. _ 

ah£“~^ ’ 


p=^£  +  <T*e“  +  o’Aie*""*  +  0  (e*“-®) 

h 

d’oii 

p,  -  p  =  c(e®  V  c")  et,  si  a  >  2  ,  p,  -  p  >  0  . 

Done, 

i)  de  la  formule  (22),  on  d^duit  les  estimations  suivantes; 

-  pour  1  <  a  <  2, 

£;[(Xfc+i  -  Mfc+x)*l  <  cexp{~cffc+ii}  + 


-  pour  a  >  2, 

E[{Xk+i  -  A^fc+i)*l  <  cexp{-ctfc+i7}  +  ce’ 


(26) 


(27) 


ii)  de  la  formule  (20), on  obtient: 

(28) 

Remarque  1.2  Une  fois  encore  on  trouve  des  estimations  de  — 

A^fc+i)*]  d’ordre  inferieur  ou  igal  k  celui  de  E[(Xfc+i  -  Mk+i)*]. 

Remarque  1.3  Le  schema  correspondant  aux  approximations  (24)  et  (25) 

oe®'*’* 

(i.e.  qui  utilise  le  gain  d  =  -5-^- — ^  “’*'  P®*  d’interfet  pratique,  puisque 

son  utilisation  oblige  a  un  calcul  plus  compliqu6  que  celui  du  schema  qu’on 
vient  d’obtenir  alors  que  I’ordre  de  I’erreur  associ^e  reste  le  m5me. 


f?((Xk+i  -  Mk+x)’l  <  exp{-ctfc+ii}E[(Xo  -  Mof]  +  ce 
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n.2.2.  Si  a  <  1, 

„  2e®  2 

Done  une  approximation  du  gain  stationnaire  $,  est,  par  example, 


et  on  pent  considdrer 
Le  schema  (9)  est  alors 


2 

h 


1 


1  + 


M*+i  =  (1  +  +  i(yfc+i  -  Ml  +  *0-^0 

= 

Les  formules  (10)  et  (11)  deviennent: 


(29) 


^*+1  -  Af*4.1  =  — 


k-f-l 


^(c*  +  fc*e“pfc+i|*) 


(y*+i-Mi  +  fcOAf*) 


done 


i.e. 


-  M*+o’]  =  - 


+  A*e“Pfc+i|t] 


E[{X,^r  -  M*+x)*]  = 


•  D ’autre  part, 


(30) 


Done 


X*+x  —  Mk+i 


Vk+i  - 


Vk+i 

h 


£?((Jr*+x  -  Mk+x)»j  = 


(31) 
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II.2.3.  Si  a  =  1, 


D.  = 


(26  4-  6*e“)e*  +  a^h'^e^  +  ahy/A  +  a^K^e  +  0  (c*) 
2 


«  - :  ■  =  T=  ■=;£  . 

ahy/A  +  <t*A* 

Quand  e  —*  0,  une  approximation  de  it  est,  par  exemple, 


ou,  plus  simplement, 


On  retrouve  le  schema  (9)  et  les  formules  (30)  et  (31). 
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1.2.2  Conclusion 


On  dtablit  les  tableaux  suivants,  lesquels  nous  donnent  les  vitesses  de  conver¬ 
gence  4  0  de  I’erreur  quadratique  moyenne  pour  les  diff4rents  filtres  approch^s 
qui  ont  fait  I’objet  de  cet  4tude,  mettant  en  Evidence  la  d^pendance  de  ces 
filtres  selon  le  rapport  variance  du  bruit  d’observation  versus  pas  de  temps. 


•  Pour  a  >  1 


gain  du  filtre  approchi 

estimation  de  I’erreur 

E[(X^^,  -  Af*+x)*l 

0,  (gain  stationnaire) 

c(e2“-i  V  E*) 

l<o<2  -  - "  — 

c(e’  V  E<“-») 

c(e®‘’“^  V  e®) 

2^  a  '  Z 

ce'^ 

D ’autre  part,  on  a: 


E\{X,^,  -  <  ce-*^  +  ce  . 


•  Pour  0=1 

gain  du  filtre  approchi 


9,  (gain  stationnaire) 


estimation  de  Verreur 


1 

fc 

CE 

„yfl+s2^  +  S^ 

CE* 

V‘+'^ 

CE® 

i+M«-\/i+*^+^)+(fc+— T^s^). 

D ’autre  part,  on  a: 


F[(X*+x  -  ce  . 


•  Pour  o  <  1 
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gain  du  fiUre  approchd 

estimation  de  I’erreur 

$,  (gain  stationnaire) 

1 

* 

QbQQSQQBIIi 

D ’autre  part,  on  a: 


<  ce-"^  +  c£*-“ 


24 


1.3  Construction  des  Filtres  Approcli4s  (cas  h  lineaire) 


On  supposera  encore  que  'y  =  0  mals  on  admet  maintenant  que  la  fonction 
h  soit  non  lineaire. 

Pour  des  raisons  li4es  k  I’application  d’une  m4thode  de  changement  de 
probabilit^s,  pour  obtenir  les  estimations  de  I’erreur  quadratique  moyenne,  on 
se  voit  oblige  de  considerer  notre  syst^me  non  lineaire  sous  la  forme  suivante: 

f  =  Xk  +  b{Xk)^t  +  ffy/ At  Wk+i,  Xo  =  C 


Vk+i  =  hXk  +  — 


^  6tant  une  v.a.  de  loi  de  probability  Po  telle  que: 


yo  =  0 


/  \—Po{x)\dx  <  oo  ,  Po  €  . 

J  On 


On  suppose  toujours  que  Yq  est  la  tribu  des  observations  jusqu’k  I’instant 


Yo  =  <7(yo,yi,---.y*)  • 

On  veut  ytudier  la  “quality”  de  I’approximation 

Mfc+i  -  Mk  +  6(M*)Af  +  ?(yfc+i  -  hMk)  ,  Mo  =  mo  ^ 

correspondante  i  une  ytape  de  pryvision:  Xk  =  E[Xk\Yo]  • 

Supposons,  en  plus,  que  6  est  une  fonction  &  dyrivyes  bornyes. 


Commen^ons  par  estimer  Xk  —  Mk- 

Xk+i  —  Mk+i  =  Xk  +  b{Xk)^t  +  ay/  Atrvk+i  —  Mk 
-6(M*)At  -  -  hMk) 

=  (X*  -  Mk)  +  [6(X*)  -  b{Mk)]M  +  ay^Mwk^i 
~d{hXk  +  -  hMk) 

=  (1  -  hd)iXk  -  Mk)  +  l6(Xfc)  -  6(M*)lAf 

+<r\/Af(u>fc+i  -  • 

Puisque 

biXk)  =  biMk)  +  b'{U){Xk-Mk), 
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Or, 


done 


et  (37)  devient: 


A  =  1  -  (1  -  <rk~  +  CiAt)* 

=  (ah— ~  eiAt)(2  ~  ah— +  CiAt) 

=  0( — )  et  A  >  c — 

£  e 


|  =  0M 


^K-^k+i  -  Af/fc+i)*]  <  «exp{-c-4^}  +  Cf 


ou,  plus  pr^cisement, 


El(X,  -  A/*)’)  <  c(l  -  A)‘  +  ee 


(38) 


(39) 


1.2.  Estimation  de  -  M^. 

Thdor&me  1.4  £e  schema  ($5)  v^rifie: 

j^*-Mfc  =  0(e"-7Vsf) 


au  sens  ou 


E(|.^fc  -  Affcl]  <  c(l  V  e*  “)exp{-c— }  +  c(e“”>  V  e»)  .  (*) 


'  On  utilisera  la  notation: 


«*  =  5e*  , 

oil  {9}k  est  un  processus  d^p4ndant  de  e  et  9  >  0,  pour  signifier  que: 

15[|*fc|I  <  Co(l  Ve‘<«>)exp{-ciy}  +  C3C*  ;  co.ci.cj  >  0  . 
On  utilisera  la  notation; 


=  <?(«*) 


pour  signifier  que: 


<  Co  exp{-Ci  — }  +  cae*. 
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Preuve 

Elle  sera  divis^e  en  plusieures  parties,  utilisant  des  changement  de  pro* 
babilit4s,  une  version  discrete  du  Theor4me  de  Girsanov  et  la  derivation  par 
rapport  k  la  condition  initiate. 


Changement  de  probabilites. 

(a)  Le  1'*'  changement  de  probabilit^s  affectera  la  loi  de  v. 
On  considire  la  probabilite  P  definie  par: 

-L- 

*  ’ 


i=l  ®  ^  .=1  ® 


- »*  =  w*  -  ( - hXk-i)  , 

c  c 

oil  Xfc_i  est  7fc_i-  mesurable. 

Qa.  donne: 

At  ^y/'Ki  y/  At  ^ 

Lk  =  exp{2^( - - y. - 

t=i  ^  e  e 

+lU~hx,.,)’} 

^  .=1  ^ 

=  «p{^(fcE  -  E'-’Jc?..)  + 

<=l  <=l  .=1 

=  • 

®  .=  1  ^  .=! 


D’aprhs  la  version  discrete  du  Theorhme  de  Girsanov  (voir  I’annexe  A), 


sous  P  (probabilite  de  referance),  w*  et  - y*  sont  des  Tk~ 

b.b.  gaussiens  independants.  (P  est  equivalent  h  P  dans  chaque 

h.) 
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(b)  Le  2'^*  changement  de  probabiUt4s  va  affecter  la  loi  de  w. 
On  d^finit  la  nouvelle  probability  P  par: 

dP. 


^1  =A-' 
dP'ru 


k  » 


ytant 


Afc*  =  exp{53 i -  Af._i)|®}  . 

.=1  ®  ^  t=i  ® 


Soit 


xSk^Wk-  A— (Xfc.i  -  Mt_i)  , 

€ 


oij  Xk-i  —  Mk-i  est  ^_i-mesurable. 

(On  rappelle  que: 

Xk^i-M,^i  =  (I  -  ah~){Xt  -  M,)  +  [biX,)  -  b{Mk)]At 


+oy/M{wk+i  -  Wfc+i)  . ) 


Alors 


A*  =  exp{-  E  (^<-1  -  Mi.i)[wi  +  h—{Xi,i  -  Mui)] 

<=i  ^  ^ 

1  A  .,A« 


+  -  51  ^^-^{Xi-i  -  A/,-i)*} 
^  .=1  ® 

n/Z7 


=  exp{-A^£(X._,  -  M,_a)u;,  -  -  M._i) 


'  1=1 


i=l 


=  exp{-A-^!^  5Z(-X’.-i  -  M,_i)u;<  - 

®  .=1  .=1 

D’apris  la  version  discrete  du  Thyoryme  de  Girsanov, 

y/^ 

Sous  P,  w/c  et - y*  sont  des  b.b.  gauss,  ind.  et  X^  vyrine: 

e 

Xfc+i  =  oA— (X*  -  M*)  +  Xk  +  b{Xk)At  +  <ry/Aivj^^i  .  (40) 
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La  density  de  P  par  rapport  k  P  est  Lk^k  done: 
V  v.a.V*  P*int£grable  et  Jj^-mesurable, 

E[rl,LkA.k\Yk] 


E{LkA.k\Yk] 


Or, 


A  A  K 

LkAk  =  exp{/i— 

®  i=i  ^ 


k 

E 


exp{-A^^52(^.-i  -  -  M-i)’} 

®  Z  ff  ... 


i=l 


»=1 


Soit 


a  ^  t  V  ^  v-2  L  ^ixr  %£■  \- 

^  —  ^“5’  2-  Xi-iVi  — E  ~  ^  _  E(-^»-i  ~  Mi^i)wi 

^  i=i  ^  ®  .=i  ^  .=1 

^  ®  <=1 


i.e. 


LfcAfc  =  expS  . 


Mais, 


X<-Af.  =  (X<_i  -  M._i)  +  [6{X<_i)  -  6(Af._i)lAt 
/ —  A* 

+OV  AttOj  —  a — (y,-  —  AAfi_i) 

=  (X._,  -  M._i  +  |6(X._i)  -  6(M._i))At 

+ay/Aiibi  +  ah—{Xi.i  -  M,-,)  -  a— (y.  -  AAf,_i) 
e  e 

=  (1  +  -  Mi^x)  +  \b{Xi.x)  -  6(M,-x)]At 

+(T\/Attt;,-  -  ff— {y,-  -  hMi.i) 


d’oii 


|»(X,.,)  -  6(ilf,.,)l  +  — (»v  -  fcWi-i)  • 
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Done 


r--  T 


=  ^  E  x,.,„  _  ^  I;  X?.,  -  A  £(x,..  -  M,_,)(Xi  -  Mi)  (41) 

+— (1  +  oh—)  E(^i-1  -  Afi-i)* 

oe  e 

-  M.-i)[6(X._i)  -  6(Mi_0I 

.=1 

E(X-i  -  Af.-i)(y.-  -  ^ -  Af,_0" 

^  1  =  1  ^  ia=l 

=  E(X,..  -  M,..)(X.  -  M.) 

1=1  ®  t=l 

+— (1  +  <r/i— )  -  M-i)* 

oe  e 

-  Mi_0[i(X,-,)  -  6(M,_,)1 

.=1 

H — ^  2-  -X^i-iMi-i  +  — 5”  2J ■W<-i(yi  -  hMi.i)  — ~  Mi-i)* 

®  .=1  ®  t=l  .=1 

=  —  E(^.-1  -  M..x)[6(A'._,)  -  6(M.-.i)]  -  —  E(^.-1  -  Af,-x)(X.  -  Mi) 

.=1  i=\ 

L  *  h?  At  * 

+—  53(-X^,-i  -  +  -T-j- +  (X,_i  -  +  2X,_iAf,_i] 

•=!  ,=i 

M,_i(y.  -  hMi.i) 

®  i=i 

fc A*  *  fc  * 

=  - E(^.-1  -  M.-i)(6(X,_,)  -  6(M._,)]  -  —  E(jr._,  -  M,_,)(X.  -  Mi) 

i=i  i=i 

-  Wi.,)’ + '^  t  ^  s;  -  4M,..) 

1=1  .=  1  ^  »=1 

A  ^ 

/•  _  _  _ _  _ _  _ _  _ 


=  -—  E(^.-x  -  Mi.i){{Xi  -  Mi)  -  (X._i  -  Af._j)| 


+— E(^.-1  -  Af.-i)(6(X._,)  -  6(M._i)] 


.=1 


,  hAt  At 

+  j  E^*-iy»  « a  E-^t-i* 

®  f=l  "  .=1 
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Soit  «,•  =  Xi  —  Mi-  On  a: 


i.e. 


Alors 


=  5Z(«.  +  tt.-i){u,  -  tti.O 

»  » 

=  2  52  «.-iK  -  «.-i)  + 12(«.  - 


52  -  «.-i)  =  ^  5Z(«?  -  “Li)  -  \  E(««  -  «-i)*  • 


5  =  -  — 


2ae 


•si 


L  A  #  * 

+ - £(^<-1  -  M-i)[i(^-i)  -  6(M._0) 


ae 


t=i 


AAi  ^  A  j 

+  "75“  IZ  ^.-iV* - 575“  12  ^.-1 

®  <=1  .=1 


(42) 


(43) 


Remarquons  que  les  3*^«  et  4**”®  t^rmes  du  second  membre  de  cette  ex¬ 
pression  sont  yQ*-adapt4s  et  disparattront  done  dans  la  normalisation. 


Derivation  par  rapport  4  la  condition  initiale 

On  considers  les  variables  qui  interviennent  dans  nos  calculs  comme  des 
fonctions  de  Xo  (condition  initiale)  et  des  processus  Wk  et  yk- 
Soit  V'*  =  V’(^o,  w*,y*). 

On  veut  deriver  (40)  et  (43). 

Definissons  alors  les  processus 


z  £  £^t  z  -1 

^  ,  Zot  =  Zt 

"ft 


et  fixons  k. 
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Pour  n>  k,  Z^k  virifie  I’expression: 


n-'l 


Znk  =  Ui^  +  ah— +  Atb'iXi))-^ 
>=*  « 

Zkk  =  1  • 


(44) 


Preuve 

De  (40)  vient  que: 


At 


done 


i.e. 


Xi+i  =  ah~{Xi  —  Mi)  +  Xi  +  b{Xi)£i.t  +  ay/^Wi+i 


axo  e  axo  ^  axo^^^  ^^'^axo 


^<+1  =  (1  +  ah - h  Atb'(^Xi))Zi 

Zi,^^  =  ?^=i  +  ch—  +  Atb'[Xi). 

e 


Par  recurrence, 


i+m-l 


2iMm=  n  (1  -i- ah— +  Atb'(X,))  . 


}=• 


Pour  n  >  k, 


n-l 


^nk  =  n  (1  +  +  /^tb’(Xj))-^  . 

y=*  ^ 


On  a  la  majoration  suivante: 
Si  b'  est  bornee, 


n— 1 


^nfc  <  n  (1  +  -  e6At)-*  =  (1  +  o/i—  -  CfcAt) (45) 

>**  «  e 


i.e. 


<  eexp{-e(n  -  fc)— <  cexp{-c^}  . 
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D’autre  part, 

Q 

log  (L*At) 


El^|i(X,-.)  -  i(M.._.)| 

ae  ,._j  oXq 

+:^ 


ae 


t=i 


^axo  axo 


^  f  fv  Af^r  ax.-  ax,  ax,.r 
-  E(x<  -  +  gXo- 


i=l 


+ A  i;(x.,  -  M-.)(^i  + 


Off 


t=i 


aXi  aXi., 

axo  aXo  ^ 

-  HM,.,)] 

oe  ,=i  °Xo 

<=1 

-—  E(^.-1  -  Af.-i)[Z.  -  2Z._1  -  Af6'(X.-i)Z,_i] 


Off 


«=i 


.=1 


Expression  asymptotique  pour  Xk  -  Mk  . 
L’4galit4  pr4c£dente  nous  donne: 


^{Xk-Mk)Zk.i  =  -—J2iXi-Mi)Zi.,--^\og{LkA.k) 

ae  oXo 

.=i 

+^EWX-,)-«(M-.)|2.-. 

.=1 
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Done 


^  *  d 

—  ~~  —  Af,_i)Z,_2  —  log(£j|;A;t) 

«=:a  C'Ao 

A  fc 

-  M-i)(Z.  -  2Z,_i  -  A«6'(X._i)Z._,) 

AAi  Jl. 

t=X 

h  ^ 

=  -  — E(-^.-i  -  •Wi-OCZ.  -  2Z._1  -  Af6'(X._i)Z,_i  +  Z._2) 
~  -  ^^log(L*A*)  , 

avec  Z_i  =  0  . 


n.^i-1  A  ..1 _ _  V  ^t-1  ^i-2 


X,-Mu  =  -E(^.-i-Af.-0[--^-2^ 

«=i  •^*-1  ^*-1 


■^*-1  Z*_i 


|8l  ■*  *  “ 


A  axo"” z*_i 

* 

=  -  E(^.-i  -  M.-i)[Zfc.u  -  2Z*.,,..i  -  Atb•iXi.l)Z,.^,i_^  +  Z*.,,,.2] 


+A«  EIM-X^,-i)  MAf,_,)]Z*_,,<_i  -  ^_^log(i,*Afc)Z*_i,o 

k 

=  E{(^i-1  -  -  Zk-i,i-,)  -  (Z*_1...  - 

«=i  ' 

~  Ta3^  .  (46) 


Or, 


At 


=  {oh—  +  At6'(X._i))^*_i,:_x 


(47) 


^*-1,1-1  ^k-l,i-2  —  (l  ^i~l,i-2)^k-l,i-l 

1 


1  - 


1  +  ah^  +  At6'(X,_s) 


^k-l,i-J 
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ah—  +  At6'(Xi_,) 

“  si  I  >  2  (48) 

1  +  ah—  +  Atb'{Xi.2) 

done 

•  Pour  i  >  2, 


ah—  +  Atb'iXi.,) 

Zk-U-i 

1  +  ah—  +  Atb'{Xi,2) 

-{ah^  +  Atb'{Xi.^))Zk-l,i.^ 
Num  ^ 

Zk-i,i-i  , 

1  +  ah—  +  Af6'(X._j) 


Num  ^  ah^  +  Atb'iXi.t)  ~  (ah^  +  AW(Xi.i)){l  +  ah—  +  Atb'iX^.^)) 

=  +  At6'(jy...,)  -  ah—  -  -  ah~b'{Xi.i) 

^  c  c 

At* 

-Atb'iXi.i)  -  ah-—b'{Xi.i)  -  AtH\Xi.i)b'{Xi.,) 

o  vAt^  A#* 

=  ^  IT  -  -  6'(X,-,))  -  ah^[b'{X,^,)  +  b^{X,.,)] 

-AtH'{X,.,)b'(X,.,) 

=  0  {At^  V  ~)  ,  dans  L"  et  Num  =  0  (At  v  ,  dans  L~  . 

•  Pour  I  =  1, 


—  {Zk-i.i  —  Zk-i,i-i)  = 


^*-1,0  —  {Zk-i,i  —  Zk-1,0) 

Zk.i.o  -  {<Th^  +  At6'(Xo))Z*_i,o 

(1  _  ah^  -  Atb'(Xo))Zk.i,o  . 


L’6galit6  (46)  devient: 


Xk-  Mk  =  ^(X,_,  -  M,_i)(- 


Num 


1=2 


1  +  +  Atb'{Xi.2) 


+  At(6'(e._i)  +  fc'(X,_x))]Z*_i,,_ 
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I 

I 

I 

I 


Soit 
<f>i  = 


<t>i  = 
Alors, 


+  (^■0  Afo)(l  ^  -  Afi'(Xo))Z*_i,o  -  Y^^log(i'*Afc)Z*_i,o 

Num 

At  ^  +  ^'(^i-i))  ,  pour  I  >  2 

1  +  ch~  +  Atb\Xi,-t) 
c 

1  -  ah^  -  Atb’(Xo)  . 

1^.1  <  Pi-1  ,  i>  2 


Pi-1  = 


|JVum| 


At 


1  +  trh—  +  AtV(Xi_i) 


+  2cfcAt 


At’ 

~  — 2“  V  At) ,  dans  L'  et  dans  L** 


Soit  p  t.q.  :  <  p  Vi  >  1  . 

D’apr^s  (45), 

^nk  ^  (1  +  Ch-- - CfcAt)“*""*>  , 

done 

^  ^  A 

<  P^(l  +  -  CfcAt)"**-’'^?^^,.!  -  Af,_i|] 

■(■(I  ^  —  +  C(At)(l  +  eh - CftAt)"**'^^ 

.b[\x^-m,\]. 


Puisque 


E[\Xi-Mi\\  <  \/E[{Xi  -  Af.)*j 

<  [c(l  -  A)’  +  ce)i  (voir  le  r^sultat  (39)) 

on  a: 

-  A/;_i||^,|Z*_i,,_i|  <  ep53(l  +  «r A— -  Cj At) 

<=j  ® 

.{(l-A)’-'  +  «]i 


(49) 
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+(i_c_)*£;[|A:o-Afo|] 

k  * . 

cp  - 

t=2  ^ 

.(((l-A)i)-^  +  cv/^ 

+  (1  -  c^fE\\X^  -  Mol] 

+  oh—  -  C6At)"**“‘' 

1=2  ^ 

.  (1  —  ah —  +  CfcAf)*"^ 

*  Af 

+c^/cX^(l  +  ah - C|,At)*<*~*^} 

.=2  ^ 

+  (1  -  c^fE\\Xo  -  Moll 

cp{{l  +  ah—  —  CfcAt)”*'*'^ 
e 

•  ]C((1  +  oh - C6At)(l  -  oh —  +  C6At)]’ 

»=i  ® 


fc-2 


At 


+c\/e  53(1  4-  <rA - C(,At)~*} 


»=o 


+  (1  -  c^)^E[\Xo  -  Mo\ 

cp{{l  +  ah—  -  CfcAt)"*'^*  ^[1  -  [oh—  -  CiAt)*]' 


#  =  1 


+cv/e- 


-} 


1  -  (1  +  -  C(,At)  ^ 

+  (l-«^)*^[|A'o-Mo|] 

At  1  “  (all—  -  CiAt)* 

c^((l  +  ah—  -  ctAt)-*+^ - L 


(afe - cjAt)’ 


+cv^ 


At 

1  +  ah - cjAt 

_ e _ 1 

A  ^ 

aA - cjAt 

€ 
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(50) 


+  (1  -  c—)'‘E[\Xo  -  Moll 

On  aura  besoin,  une  fois  encore,  de  considerer  deux  cas: 

•  Si  a  >  2,  on  a  ^  =  cf “  et  (50)  devient  alors: 

^  E[\Xi.^  -  <  ce^-“exp{-c^}  + 

.=1  ^ 

done 

(T£  O  *3 

X^t  —  Mk  +  Jog{i'fcA*)^*_i,o  =  ^(^’)  • 

Prenons  I’esp^rance  conditionnelle  par  rapport  k  Yq. 

Si  on  montre  que: 

E[^  log(LfeAfc)Z*-i.o|Ko*l  =  (V^)  (51) 

alors 

Xk-Mk^0[eh)  , 

au  sens  ou 

E[\Xk  -  iWiklJ  <  ce*~“exp{-M^}  +  ce?  ;  c,/i  >  0  . 

On  utilise  le  lemme  1.6  pour  prouver  ce  r^sultat. 

•  Si  a  <  2,  on  a  p  =  cc*^““**  et  (50)  devient  alors: 

<  cexp{-c^}  + 

1=1  ® 

done 

jj 

Xk-Mk  +  y  — log(L*A*)Z*_,.o  =  0(e“-i)  . 

On  utilise  le  meme  raisonnement  du  cas  pr^eddent. 

Encore  une  fois,  prenons  l’esp4rance  conditionnelle  par  rapport  4  Y^. 
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Si  on  montre  que: 

log(I*A*)Z*_,.o|yo*]  =  0  (5“-t)  (52) 

alors 

au  sens  oii 

El\SCit  —  Mk\\  <  cexp{— +  ce"~5  ;  c,  p  >  0  . 

On  utilise  le  meme  lemme  pour  prouver  ce  r^sultat. 


Remarque  1.5  Pour  a  >  2  on  a  obtenu  une  estimation  valable  pour  des  in¬ 
stants  loin  de  I’instant  initial.  Neanmoins,  on  peut  aussi  obtenir  une  estimation 
valable  pris  de  I’instant  initial: 

E[\kk  -  Mk\]  <  cexp{-p— }  +  ee  •,  c,p  >  0  ,  (53) 

s 

puisque 


<  cp(l  +  ey/e)  - CfcAt)"**~‘' 


+  (l  -  +  C6Af)(l  +  -  ctAt)“**~‘* 

.  E[\Xo  -  Mol]  ,  car  1  -  A  <  1 
At 

1  -F  ah - CfcAf 

<  cp(l-l-cv7) - - 

ah - cjAt 

e 

+  (1  _  c^)*^[|A'o  -  Moll 
e 

<  cexp{— c — } -f- ce  . 


Lemme  1,6  ; 

Soit  a  >  1. 

Soit  Fk  un  processus  adapU  (dependant  de  ej,  differentiable  par  rapport  d 
Xo  (ik  =  Q,l,^..,K). 

Alors,  si  les  moments  de  {■§^}k  >ont  finis  et  si  Fk  =  O(c’),  pour  un  certain 
9>0, 


E[Fk-^logiLkAk)Zk-iMyo]  =  -E[^Zk.xM\  +  C?(e’+^)  . 

aAo  vAo 
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Pour  Pjt  =  1  ce  r^sultat  est  plus  fort  que  (51)  e  (52). 


Preuve 

On  rappelle  d’abbord  que: 

Bg 

Si  a  €  et  et  —  sont  des  fonction  int^grables  par  rapport  k  la 
Bx 


mesure  de  Lebesgue,  alors 


/#d.  =  0. 

J  dx 


En  particulier, 


Soit  V*  une  v.a.  differentiable  par  rapport  k  Xo  telle  que; 

+ 1^11  <  cc .  (56) 

Pe  cfA-o 

Prenons  une  version  V'(2,Wfc,yib)  differentiable  par  rapport  k  z  et 
posons 

9{x)  =  Po{x)rl){x,Wk,yk)  ■ 

Alors 

^l^  +  *T^X,)\v,,v}  =  0.  (57) 

OA-o  Po 

On  applique  la  formule  de  changement  de  probabilites, 


E{Fk-^  log(L*A*)^*-i,oi^*Afc|yo*] 
_ ^Aq _ 

E[L*Ak|yo*] 

Lfl^Ak 

l?[J^*A*|yo*) 

uAq 

^[L*A*|:'o*) 


k  =  l;[n^(i*A*)^*-i,oin‘l  • 

OJLq 
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Puisque 


d  dFk  d 

'Qj^{f'k{Lk^k)Zk-l,o)  =  -^j^(I'k^k)Zk-lfi  +  f'k-^J^{LkAk)Zk-ifi 

+f’k{LkA.)——Zk-ifl 

uJLo 

on  a: 

=  ^(^(n(/i*A*)^*-i.o)|Ko‘I  -  E{Fk{Lk/ik)-^^Zk.t,oM 

-E[^^iLkA^k)Zk-k,om  . 

^tant  donn4  que  Fk{Lk^k)Zk-ifi  v4rifie  la  condition  d’int^grabilit4  (56)  on 
applique  (57)  pour  obtenir: 

^-^{Fk{Lkkk)Zk.x,o\yo\  =  -i^ln(i*A*)Z*_,,o^(A'o)|yo*] 

ajkQ  Po 


et  alors 


'^k  =  -^^{Xo)Fk[Lklkk)Zk.xM\-E\Fk(LkKk)^Zk-xAYo\ 

Po  OAq 

^  a  Fu 

-^?(^(^*A*)Z*_,,o|Ko*l . 

Soit 

4>k  =  £?(n^log(L*A*)Z*_,,o|yo‘I 


£;[(L*A*)$(Xo)nZ*_,.o|yo‘l  £?[(L*A*)n^^*-i,o|yo*] 

_ £o _ pAq _ 

E\Lklkk\Y^]  E\Lklkk\Y^] 

E[{Lklkk)^^Zk.,,o\Yo\ 

E\LkKk\Y^^\ 


=  -EAxo)FkZk.x,M]  -  E\Fk-^Zk-i,o\Yo^] 

Po  pAo 


(58) 
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Puisqu’on  avait  suppose  qu4: 


-1 


•  Enfin,  pour  obtenir  une  estimation  de  I’erreur  comise  dans  une  £tape  de 
filtrage,  il  nous  suffit  de  remarquer  qu’une  approximation  de  Xk  —  E[^k-i\Yo] 
est  donn^e  par  le  schema  (35),  £tant 

SCk-Xk  =  E[j:*(ro*l  - 


I 


E((6(X*_0At  +  <7V^u>*|yo*l 
A/£?(6(X*.i)|yo*] 

|X*  -  Xk\  <  cAt .  (61) 
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A  Version  discrete  du  th4or^me  de  Girsanov 

Consid^rons  une  filtration  (/*)«  )  k  €  {0, 

Soient: 

•  {wk}  un  /jb-bruit  blanc  gaussien  pour  la  probability  P. 

•  {<Pk}  un  processus  /k-i-mesurable  (i.e.  pr^visible)  tel  que: 

<  oo  (P  p.s.) 

i=0 

•  {Zk)  le  processus  d^finit  par: 

Zk  =  exp{E?=i  <PiVii  -  I  ELi  .  *:  >  1 
Zq  —  1  . 

Alors, 

1.  {Zk}  est  une  martingale  discrete. 


Preuve 

"fc-1 

=  Zk-iE\fixp{<PkWk  -  -\Pk?)\h-i]  , 
puisque  est  ^_i-mesurable. 

=  ^k-l  1 

puisque  {w*}  est  un  b.b.  gaussien  et  (Pk  est  7k_i-mesurable. 
et  Zjb  est  7k  int^grable. 


On  considers  la  probability  P  dyfinie  par: 

d/>(w)  =  ZicMdP(u;) . 


Remarque  A.l 

Du  fait  que  (Zk)  est  une  martingale,  vient  que: 


^1  =Z 
dP  U  * 


(62) 
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2.  (Version  discrete  du  Thiorime  de  Girsanov) 

Pour  la  probability  P  dyfinie  dans  (62),  le  processus  {^it}  d^fini  par: 


=  Wfc  -  <Pk 


est  un  b.b.  gaussien  . 


Preuve 

On  veut  dymontrer  que: 


A* 

£(exp(A©t)|7fc_il  =  exp(Y)  »  VA  G  IR 


i.e. 


Or, 


A* 

£:(exp(Affi*  -  =  1  ,  VA  G  IR  . 


A*, 


J5(exp(Afl>fc  — —)ZK\fk-i] 

^(exp(Ai&fc  -  Y)(/fc_i]  =  - E[ZK\rk  i] -  (formule  de  Beyes) 

A* 

E(exp(Au>*  -  X(pk  -  y)^*|7*-i] 

_  ^ 

puisque  Zk  est  une  martingale  et  d’aprfes  la 
remarque  ci  dessus. 

A*  1 

=  E[exp(Att;fc  -  Xipk  -  y)  exp(¥7fcw*  -  -'pl)\h-i] 

=  E{exp|(A  +  ipk)rvk  -  ^(A  +  <PkY\\h-i} 

=  1  ,  de  m6me  qu’en  1.  . 
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1  Introduction 


The  purpose  of  this  paper  is  to  provide  numerical  approximation  schemes  for  the  following 
abstract  stochastic  differential  equation 


d 

dut  +  Autdt  =  ^BiUtdYf  , 

t=i 


uo  =  u  , 


(1.1) 


where  the  operator  A  is  unbounded  in  some  separable  Hilbert  space  H.  On  the  other  hand,  the 
operators  (Bi,  82,  -  •  •  iBd)  are  supposed  bounded.  The  most  important  example  of  equation  of 
this  type  is  provided  by  Zakai'  equation  of  nonlinear  filtering. 

Similarly  to  the  deterministic  case,  it  is  possible  to  associate  a  semi-group  (actually  a  stochas¬ 
tic  semi-group,  according  to  Skorokhod),  w'ith  equation  (1.1).  The  approach  adopted  here  is 
to  build  approximations  of  the  stochastic  differential  equation  as  approximations  of  the  corre¬ 
sponding  semi-group. 

In  Section  2,  basic  definition  and  properties  of  random  linear  operators  and  stochastic  semi¬ 
groups  are  presented,  following  Skorokhod.  A  general  approximation  theorem  for  stochastic 
semi-groups  is  proved,  with  error  estimates.  This  theorem  can  be  thought  of  as  an  extension  of 
Theorem  2.2  in  Newton  (6,  p.32],  based  itself  on  earlier  work  by  Wagner  and  Platen  [9,10],  to 
stochastic  semi-groups.  In  Section  3,  the  existence  and  uniqueness  theorem  of  [7]  is  completed 
with  an  abstract  regularity  result.  In  addition,  the  semi-group  associated  with  equation  (1.1)  is 
defined.  In  Section  4,  the  following  time-discretization  scheme  is  investigated 

^n+l  =  [f’jn+i-in  ^*"+1] 


fio  =  8 


where  (Pf  ;  t  >  0)  is  the  (deterministic)  semi-group  generated  by  -A,  whereas  the  two- 
parameter  semi-group  ($*  :0<s<t)is  defined  by 


=  exp 


1=1 


i=l 


This  is  nothing  but  a  Trotter-like  product  formula,  with  the  attractive  feature  that  the  deter¬ 
ministic  and  the  stochastic  part  are  decoupled.  It  is  proved  using  the  approximation  theorem 
of  Section  2,  that  approximates  in  the  Z/*-sense,  and  that  the  speed  of  convergence  is  of 
order  0{k),  where  k  denotes  the  time-step.  In  addition,  in  the  context  of  nonlinear  filtering, 
it  is  possible  to  ^ve  a  simple  probabilistic  interpretation  to  this  time-discretization  scheme, 
following  the  approch  of  [3]. 


2  Stochastic  semi-groups 


In  the  next  two  subsections,  definitions  are  given  concerning  random  linear  operators  and 
stochastic  semi-groups,  following  the  work  of  Skorokhod  [11,12].  If  not  explicitely  stated,  all  the 
vector  spaces  to  be  considered  here  are  separable  Hilbert  spaces.  Let  (Q,  7^,  P)  be  the  underlying 
probability  space. 

2.1  Random  linear  operators 


Definition  2.1  A  (strong)  random  linear  operator  from  F  into  G  is  a  linear  and  continuous 
operator  from  F  into  L^{Q,F ;  G).  The  set  of  all  such  operators  will  be  denoted  by  C\{F,G). 

Remark.  Let  U  6  C\{F,G).  Then  only  Ux,  for  a:  e  F,  is  defined  as  a  G-valued  random 
variable.  In  particular,  U  itself  is  not  a  random  variable  taking  values  in  C[F,G).  However 

(i)  The  mapping:  x  E(f/x)  defines  a  linear  and  continuous  operator  from  F  into  G,  which 
will  be  denoted  by  E(l/),  in  the  following  way 

VxGF,  E(F)x  =  E(t/x)  .  (2.1) 

(ii)  In  the  same  way,  the  mapping:  (x,2/)>-.  'E,(Ux,Uy)c  is  a  symmetric  and  continuous  bilinear 
form  on  F  X  F,  which  uniquely  defines  a  linear  and  continuous  self-adjoint  operator  on  F, 
which  will  be  denoted  by  E(U‘U),  in  the  following  way 

Vx.y€F,  {E(irU)x,y)F  =  E(Ux,Uy)o  .  (2.2) 

In  particular,  this  allows  to  define  the  following  norm  in  £^(F,  G) 

Ili^llcHFC)  =  l|E(t^*f/)|li:ff.f)  .  (2.3) 

(Hi)  More  generally,  let  F,  G  £j(F,Gi)  (»  =  1,2).  For  all  C  e  £(Gi,G2),  the  mapping: 
(x, y)  t-*  E(GFix,  F2J/)g3  is  a  continuous  bilinear  form  on  Fi  x  F2,  which  uniquely  defines 
a  linear  and  continuous  operator  from  Fi  to  F2,  which  will  be  denoted  by  E(F2CFi),  in 
the  foUowing  way 


Vx  e  Fi ,  Vy  e  F2  ,  {E{U2CUi)x,y)F,  =  E(CFjx,  U2y)G,  ■  (2.4) 

Proposition  2.2  The  (deterministic)  operator  defined  by 

VC  €  £(Gi,G2).  Qu,.uAC)  =  nu;CUi)  ,  (2.5) 

is  linear  and  continuous  from  £(Gi,G2)  into  £(Fi,F2). 
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Proof.  By  definition 


<  I|C||£(g.,g,)  [nU2y\l,)"^ 

<  ll<^ll£(Gi.G,)  ll^l|l£j(F,,Gi)  \\^'A\c\(F2.G2)  klF,  I^IFj  , 

from  which  it  follows  that 

I|QGi,Gj(C^)I|£(F,,F2)  ll^lIrCGi.Gj)  l|f^l||£2(Fi.Gj)  II^2||£2(F2,Gj)  < 
which  proves  the  assertion. 


(2.6) 

□ 


Remark.  It  might  happen  that  the  estimate  (2.6)  is  not  tight  enough.  Then  it  can  be  noted 
that  Vi  6  Pi ,  Vy  e 

(0U,,G5(C')i,y)Fi  =  E(CI/ii,t/2y)Gj 


which  gives 


=  (CE(t/i)i,E(I/2)i/)G2  +  E{C(I/,  -  E(I/i))x,(I/2  -  E(C/2))y)G,  , 


l|0Gi.G2(C')|l£(F,,Fj)  <  |1C||£(g,,G3)  {l|E(I>i)ll4(Fi,Gi)  11E(P2)||£(F2,Gj) 

+  ||Pl  -  E(i7l)||£2(F,,Gl)  \W2  -  E(P2)||£2(F2.G2)}  • 


(2.7) 


Definition  2.3  Let  B  C  F"  be  a  a~algebra.  A  random  linear  operator  U  €  C^{F,G)  is  B- 
measurable  if  and  only  if'ix  €  F  the  G-valued  random  variable  Ux  is  B -measurable. 


With  this  definition,  it  is  possible  to  apply  a  random  linear  operator  to  a  random  vector, 
provided  they  are  mutually  independent.  Indeed 

Proposition  2.4  Let  U  €  C^(F.G)  and  A.B  C  F  be  two  mutually  independent  a-algebras.  If 
U  is  B-measurable,  then  it  can  be  extended  as  a  linear  and  continuous  operator  from  L^{il,A  \  F) 
into  L^(il,A  V  B ;  G).  with  same  norm. 


Remark.  In  addition,  the  mapping:  i  'E(Ux\A)  defines  a  linear  and  continuous  operator 
from  L^{il,A  \  F)  into  L^{Q,A;  G),  which  coincides  with  E(l/). 

Proof.  First,  let  i  g  L^{n,A ;  F)  be  simple,  i.e. 

n 

I  =  ^  a,  1^,  ,  with  a,  G  F  and  Ai  £  A  . 

i=l 
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It  is  natural  to  set 


which  gives 


Ux^'£UaiU,  , 


E(|t^i|^)  =  I^E(|{/a.|^  U) 

i=l 

=  f^E(|I/a.|^)P(>l.) 

i=l 

^  tl^ilcKF.G)  =  ll^ll£J(F,G)  Ejxjp  . 


The  same  inequality  holds  for  all  x  €  L^{Q,A;  F),  by  density  of  simple  random  variables. 
For  the  reverse  inequality,  it  is  enough  to  remark  that 


FlTiT  ^  =  Italic 

0^xeL^{^lJ^iF)  0,ti6F  |x|f> 


Proposition  2.4  allows  to  define  the  product  of  mutually  independent  random  linear  opera¬ 
tors.  Indeed,  let  U  €  C^(F,G)  and  V  6  Cl(G,H).  If  U  and  V  are  mutually  independent,  then 
the  product  operator  V  U  can  be  defined  as  an  element  of  Cl{F,H).  Moreover 

11^  U\\ci,{F,H)  ^  ll^^lk?(F.G)  ll^''ll£»(G.//)  • 

The  purpose  of  the  next  proposition  is  to  prove  a  morphism  property  for  the  mapping: 
(Ui,U2)  ©Gi.Gj  defined  by  (2.4)  and  (2.5). 

Proposition  2.5  Let  Ui  €  C]{Fi,Gi)  and  K  e  Cl{Gi,Hi)  (i  =  1,2).  Assume  that  (U^Ui)  and 
(^15^2)  arc  mutually  independent.  Then 

®Vi  Ih.Vi  Ui  =  0  ©Fi.V,  •  (2.8) 

Proof.  Let  C  e  C{Hi,H2).  By  definition,  Vx  G  F) ,  Vy  G  Fz 

(Oi'i  u,y3U2(C)x,y)f2  =  E(CViC/ix,V2f/2y)f/, 


=  E(E(V2-CVi)C/ix,F2y)Gi 


=  (E(ff2*©v',,v,(C)Fi)x,y)F2 


=  (0Gi,t/2  (0i'i.V2(C'))x,y)Fj 
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2.2  Stochastic  semi-groups 


Definition  2.6  A  (strong)  stochastic  semi-group  in  H  is  a  two-parameter  family  {U(  :  0  < 
s  <  t)  of  (strong)  random  linear  operators  in  H,  satisfying 

(i)  for  all  s  <  t  <  u  <  V  ,  U*  and  U(f  are  mutually  independent, 

(ii)  for  alls  <t<u,  U*  =  Uf  . 

The  stochastic  semi-group  is  strongly  continuous  if 
(Hi)  'ix  E  H  ;  =  0  . 


Remark.  The  independence  property  (i)  makes  it  possible  to  define  the  product  operators 
appearing  in  the  semi-group  property  (ii). 


The  next  Proposition  gves  a  sufficient  condition  for  the  independence  hypothesis  (i)  to  hold. 
Indeed 

Proposition  2.7  If  for  all  s  <  t,  U‘  is  y* -measurable,  where  the  two-parameter  family  {y‘  : 
0  <  s  <  t)  of  a-algebras  satisfies 

(iv)  for  all  s  <t  <u  <  V  , 

y(  and  y^  are  mutually  independent, 

yt'^y^cyt , 

then  (i)  holds. 

Proposition  2.8  The  two-parameter  family  [Q^  :  0  <  5  <  t)  of  bounded  linear  (deterministic) 

operators  in  H,  defined  by  Q*  =  is  a  non-homogeneous  strongly  continuous  semi-group 

in  H. 

Definition  2.8  A  discrete  stochastic  semi-group  in  H  is  given  by  a  family  (U-^i  :  f  =  0, 1,  •  •  •) 
of  random  linear  operators  in  H,  such  that  for  all  j  <  i,  and  ore  mutually  independent. 
The  semi-group  itself  is  the  two-parameter  family  (l/^  :  0  <  I  ^  m)  defined  by 

uL=UuU. 

i=l 

By  convention  = 
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2.3  Approximation  of  stochastic  semi-groups 

The  next  Theorem  is  an  extension  of  Theorem  2.2  in  Newton  [6,  p.32],  based  itself  on  earlier 
work  by  Wagner  and  Platen  [9,10],  to  stochastic  semi-groups. 

Roughly  speaking,  this  result  says  that,  if  the  one-step  error  between  two  stochastic  semi¬ 
groups  is  of  order  then  the  overall  error  will  be  of  order  ©(jf/*),  provided  the 

expected  value  of  the  one-step  error  is  of  order  0(k’‘/^+'),  where  the  estimations  are  understood 
in  the  i^-sense. 

To  be  specific,  let  ir  :  0  =  to  <  <  •  •  •  <  ^  <  •  •  •  <  =  T  be  a  partition  of  the  interval 

[0,  T],  with  mesh-size  k. 


Theorem  2.10  Let  (Uj^  ;  0  <  /  <  m)  and  (Vj,  ;  0  <  /  <  m)  be  two  discrete  stochastic 
semi-groups  in  H.  Suppose  that  €  C\{D,D)  where  D  C  H. 

Suppose  that  the  following  stability  estimates  hold  for  the  two  discrete  stochastic  semi-groups 


(2.9) 

mUUl)\\c(D.D) 

<  1  , 

(2.10) 

and 

(2.11) 

l|E('^iVi)!l£(W,K) 

<  1  • 

(2.12) 

If  the  following  consistency  estimates  hold  for  the  one-step  error  ^ 

P.'+illcjCD,//)  ^ 

Oo(t,+1  -  t.)(^+^)/2  ^ 

(2.13) 

1|E(^|+i)||c(£).H)  < 

oi(t.+i  -  , 

(2.14) 

then  the  overall  error  satisfies 

llt^n  -  V^n||£j(D.//)  =  . 


Proof.  From  the  decomposition 

Un-v„  =  2  {ul^,  -  Ui , 

i=0 

it  follows,  introducing  A,  =  (Ut+i  -  that 

Vfi€^,  E|l/„e-V„0|?,  =  2E|A.fi|?,  +  222E(A.e,A>fi)ff  .  (2.15) 

1=0  i=0  j=0 

The  rest  of  the  proof  is  divided  in  two  parts. 
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□  Analysis  of  square  terms 
By  Proposition  2.5,  for  all  0  G  i? 

E\AM%  = 

Each  of  these  three  components  will  be  studied  separately,  making  use  of  Proposition  2.2  and 
estimate  (2.6). 

•  G  C\{n,H)  ,  therefore  for  all  C  G 
0y.^  +  l  y.^  +  l(C)  G  C(H,H)  , 

and 

ll0V,;+\l/^+*(C')||£(H,W)  ^  <  1!C'||£(//,W)  • 

•  G  H)  ,  therefore  for  all  C  €  C(H,H) 

(^)  €  C{D,D)  , 
and 

|10iJ^,,«!^,(C)IU(D,£))  ^  ^  l|C’|U(ff,H)  Oo(<«+1  -<«■)’”*’'  • 

•  Ui  G  r2(P,  z?) ,  therefore  for  all  C  G  C(D,  D) 

Qum{C)^C{D,D)  , 

and  j 

||0l/.,t/i(C)||£(£),D)  <  \\C\\c{D,D)  Wi\\c2(D.D)  ^  ll<^ll£(r>.D)  • 

It  follows  from  this  first  analysis  that  for  all  8  G  .D 

E|A,e|?,  <  ag(t,+i  -  <.)'+*  |u|J, 

<  e^^  |a|i>  , 

where  c  =  max{0D,'yH)- 
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O  Analysis  of  double-product  terms 

Consider  the  following  partition  of  the  interval  [0,  T\  build  from  the  original  partition  ir,  under 
the  assumption  that  j  <  t 


0  —  to  ^i+i  ti  t<+i  tn  —  T 


Using  this  new  partition 

A.  =  sU,  Ui , 

A.  =  vU,  u, . 

Therefore,  by  Proposition  2.5,  for  all  &  €  i? 

Each  of  these  five  components  will  be  studied  separately,  making  use  of  Proposition  2.2  and 
estimates  (2.6),  (2.7). 

•  V'+^  €  ,  therefore  for  all  C  €  C{H,H) 

and 

•  ^j+1  e  C]{D,H)  and  ,  therefore  for  all  C  €  C(H,H) 

and 

il®f;^,,v;*^,(^)ll£(£l,W)  <  \\C\\c{H,H)  {ll^(^t’+l)ll£(D,//)  l|E(K+i)llc(£),/f) 

S  l|C'||£{ff.;^){al(^+l  -fi)’’^*'  * 

+oo(«.+i  -  t,)(^+'>/2(e^H<‘'+>-‘')  -  1)*/*} 
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•  €  C]{D,D)  and  G  ,  therefore  for  aU  C  G  C(D,H) 

0rrJ+l  ^  C(D,H)  , 

Vj  *  I 

and 

||0i/j+»,V'/+>(C')ll£(r>.H)  ^  ll<^ll£(D.H)  llt^/''^Ml£j{D,D) 

<  \\C\\c(D,H)  e5(/5L+^ir)(‘.-‘.+.)  . 

•  Uj+i  e  ^?(-D,D)  and  Sj^i  G  jC^(D,ff)  ,  therefore  for  all  C  G  C{D,H) 

(C)G/:(X»,Z?), 

S+i’®j+i 

and 

<  ll^ll£(£).H)  {||E(1//+i)1|£(D,D)  ||E(^j+i)i|£(D,H) 

+oo(ti+i  -  !)»/'} 

•  £/j  G  C\{D,D)  ,  therefore  for  all  C  G  C{D,D) 

Qu„u,{C)^C[D,D), 

and 

||0i;,,l^,(C)ll£(D,£,)  <  ||C1|£(d.D)  \Wj\\\i(D^D)  ^  • 

It  follows  from  this  second  analysis  that  for  all  6  G  i? 
lE(A<fi,  Ajfi)//!  <  {qi  +  Oio0D  ci^r><‘j+»"*^>}  {oi  +  ao7K 

<  {ai  +  aoc  |e|?,  , 

where  c  =  max(/JD)7H)- 
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□  Conclusion 


Each  of  the  square  terms  in  (2.15)  is  of  order  and  there  is  n  such  terms.  On  the  other 

hand,  each  of  the  double-product  terms  in  (2.15)  is  of  order  0(fc’'+*)  and  there  is  n(n  —  l)/2 
such  terms.  Therefore,  for  all  ti  €  O 

E11/„G  -  VnUljf  <  alTk'  |Qli,  +  {oi  +  ooc  |si?,  . 

In  other  words 

\\Un-V„\\c.iD.H)<C{T)k^'^  . 

□ 

The  rest  of  this  paper  is  devoted  to  the  application  of  the  approximation  theorem  to  the 
time-discretization  of  bilinear  stochastic  PDE,  such  as  Zakai  equation  of  nonlinear  filtering. 
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3  Bilinear  stochastic  PDE 


The  purpose  of  this  Section  is  to  study  the  following  abstract  equation 

d 

dui  -h  Aut  di  =  ^  BiUt  dYf  , 

1=1 

(3.1) 

Uo  =  fi  , 

under  the  hypotheses  listed  below. 

Hypotheses 

•  (Ti;  t  >  0)  is  a  d-dimensional  standard  Wiener  process,  defined  on  an  underlying  proba¬ 
bility  space  P).  In  particular,  the  c-algebras 

yt  =  <r(Y,-,0<3<t),  yt  =CT{Yr-Y,;  s<T<t)  , 

satisfy,  foralls<t<u<r 

yt  and  y^  are  mutually  independent, 

yfvy-cyt . 

*  Let  V  and  H  be  two  separable  Hilbert  spaces  with  H  identified  with  its  dual,  and  V 
densely  and  continuously  included  in  |  •  |  and  ||  •  ||  will  denote  the  norm  in  H  and  V 
respectively,  and  <  •  ,  •  >  the  duality  product  between  V  and  V'. 

Hypothesis  [A]:  The  operator  A  €  C(V,V')  is  an  unbounded  linear  operator  in  H.  In 
addition,  for  all  u  €  V 

<  Au,u  >  +A|u|*  >  Atllull*  • 

There  is  no  loss  in  generality  in  assuming  that  A  =  0,  i.e. 

<  Au,  u  >  > /i||u||*  .  (3.2) 

It  follows  that  -.4  generates  a  strongly  continuous  semi-group  (P*;  t  >0)  of  bounded  linear 
operators  in  H,  and 

\\Pt\\c(H,H)  ^  1 

Moreover,  A  has  a  square  root  given  by  the  formula  [2,  p.282] 

^-1/2  i  i  x-i/i  (A  +  A/)-*  dX  .  (3.4) 

5r  Jo 

For  every  integer  r,  introduce  D'’  =  D{A^f^).  These  spaces  are  all  separable  Hilbert  spaces, 
with  norm  |  •  |r.  It  is  assumed  that 

D{A)  =  Z?(A*)  ,  (3.5) 
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where  A*  denotes  the  adjoint  of  A  in  £(  V,  V').  According  to  [5],  this  is  a  sufficient  condition 
for 

£>(A*/^)  =  i?(A’^/2)  =  V  ,  (3.6) 

to  hold. 


•  Hypothesis  [B(0)]:  The  operators  Bi  £  C(H,  H)  1  <  i  <  d,  and  by  definition 

1/2 

<  +O0  . 


(3.7) 


In  the  sequel,  it  might  be  needed  that  these  operators  are  more  regular,  e.g.  satisfy  for 
some  integer  r 

Hypothesis  [B(r)]:  The  operators  5,  6  C{D’',D’’)  1  <  i  <  d,  in  which  case  by  definition 

<+~-  (3.8) 


•  Finally,  given  any  separable  Hilbert  space  F,  Af^(0,T;  F)  will  denote  the  subspace  of  those 
elements  of  L^((l  X  [0,T];/’)  that  are  adapted  to  the  filtration  (y*  :  0  <  /  <  T). 


3.1  Existence,  uniqueness  and  regularity  results 

The  following  theorem  (7]  proves  existence  and  uniqueness  of  a  solution  to  (3.1). 

Theorem  3.1  Assume  [A],  [B(0)]  and  U  €  H.  Then  equation  (3.1)  has  a  unique  solution 
u  e  M^{0,T;V),  which  satisfies 

(i)  u£L\Q;C{[0,T];H))  , 

(iij  lutl^  +  2/  <  Au„u,  >  d3=  |fi|*  +  2  V  f  {BiU„u,)dY)  +  ^  /  15,«,|*ds  . 

Jo  ^  Jo  Jo 


Moreover  the  following  estimate  holds,  with  fio  defined  by  (3.7) 

E|«(|*  <  E|u.p  . 


(3.9) 


With  additional  assumptions  on  both  the  initial  condition  8  and  the  operators  Bi  1  <  i  <  d, 
the  following  regularity  result  holds  for  the  solution  of  equation  (3.1). 


Proposition  3.2  Assume  [A]  and  [B(0)).  Assume  (B(r)]  and  ft  £  for  some  integer  r.  Then 
the  unique  solution  u  of  equation  (3.1)  satisfies 


ti  €  A/2(0,r;  £>’■+>)  n  i*(n;C((0,r];  D^))  . 


Proof.  The  proof  presented  here  is  adapted  from  [1].  In  fact,  it  will  be  proved  by  induction 
with  respect  to  r,  that 

//[B(r)]  holds  and  U  €  , 

then  V  =  A’'^^u  is  the  unique  solution  in  M^(0,T;V)  of 


d 

dvt  +  Avt  dt  =  A^/^BiUt  dYi 
1=1 


Wo  =  A''f^u 


(3.10) 


and  therefore  satisfies 


(i) 


(ii) 


V  £  LHSl;C([0,Ty,H))  , 

\vt?+2j*  <  Av,,v,  >  ds=  \A^'H\^+2'£j\A^f^BiU.,v.)dY: 

+  ^  \A^I'^BiUa\^  ds  . 


The  assertion  holds  for  r  =  0,  by  Theorem  3.1. 

Suppose  now  that  it  holds  for  a  given  integer  r,  and  asume  that  [B(r+ 1)]  holds  and  H  €  D'"^^ . 
A  fortiori  u  £  and  also  [B(r)]  holds  by  interpolation.  By  induction  hypothesis,  it  follows 
that  u  £  A/2(o,r;£>’'+^). 

Next,  define  /«  =  (/  +  and  J;  =  (/  +  Property  (3.6)  implies  that  both 

Jf  and  7*  belong  to  C{H,V).  Therefore,  Tt  itself  belongs  to  each  of  the  three  spaces  £(V',  V'), 
C{H,H)  and  £(V,  V),  and  so  does  A*/Ve  =  (/  -  7e)/e.  Now,  for  all  w  6  V 

A*^*7eAw  =  AA^^*7eW  (3-11) 

since  this  equality  obviously  holds  for  any  w  €  D{A)  by  (3.4),  D{A)  is  dense  in  V,  and  the 
operators  on  both  sides  of  (3.11)  belong  to  C{V,V'). 

Define  next  v*  =  A'^^J*w  where  w  is  the  unique  solution  of  (3.10).  First  w*  = 
and  since  u  £  M^{0,T\  it  follows  that 

^(’■+i)/2u  ^  in  A/2(0,T;fi')  .  (3.12) 


On  the  other  hand,  since  A^/^Jt  6  C{V',  V),  it  follows  that  w'  satisfies 

dvl  +  A^I^J^Avtdt  =  dYi 

1=1 

wg  = 


(3.13) 
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Using  (3.11)  gives 


d 

dv‘  +  AVf  di  =  ^  Je4>t  dYt  , 

1=1 

with  <i>\  = 

Since  Bi  G  £(/?’■+^ /?>■+» )  for  1  <  i  <  if,  and  «  6  M'^{0,T\  D^+^),  it  follows  that  <f>'  G 
M^(0,T;H)  and  therefore  Je<t>*  — »  tf>*  io  il/*(0,r;/f).  Also  rg  — ►  in  H.  It  is 

now  easy  to  prove,  along  the  lines  of  the  proof  of  Theorem  1.1  in  [7],  that  any  subsequence  of 
(»*  :  e  >  0}  is  a  Cauchy  sequence  in  both  M^{0,T\V)  and  L^[^\C{[Q,T]\  H)).  But  in  view  of 
(3.12)  the  limit  has  to  be  □ 

Moreover  the  following  estimate  holds,  with  /?r  defined  by  (3.8) 

E|u,|2  <  E|u,|*  .  (3.14) 

3.2  Associated  stochastic  semi— group 

Theorem  3.1  allows  to  define  a  two-parameter  family  (£7*  :  0  <  s  <  <)  of  linear  operators 
from  H  into  in  the  following  way:  for  all  fi  G  H,  U‘u  is  the  value  at  time  /  of  the 

unique  solution  of  equation  (3.1)  starting  from  the  initial  condition  u  at  time  s. 

Obviously,  the  following  properties  hold 

•  the  linear  operator  £//  is  continuous  from  H  into  by  estimate  (3.9), 

•  for  all  fi  G  /f  ,  the  random  variable  £/,*a  is  ^/-measurable, 

•  for  all  s  <  /  <  u  ,  £//  =  £/„  £//, 

•  for  all  a  G  ,  ^lim^EIU/^j^a  -  alp  =  0,  by  property  (i)  of  Theorem  3.1. 

Therefore 

Proposition  3.3  The  two-parameter  family  {U’  :  Q  <  s  <t)  is  a  strongly  continuous  stochas¬ 
tic  semi-group  in  H,  with  first  two  moments 

E(Un  =  Pt.s  (3.15) 

\\mcHH,H)  <  (3-16) 

where  {Pt  :  <  >  0)  is  the  (deterministic)  semi-group  generated  by  —A,  and  /3o  defined  by 

(S.7). 

Again,  with  additional  assumptions  on  both  the  initial  condition  17  and  the  operators  Bi 
1  <  *  S  d,  more  precise  results  are  available 
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Proposition  3.4  //  {S.8)  holds  for  some  integer  r,  then  the  two-parameter  family  {Uf  :  0  < 
3  <  t)  is  a  strongly  continuous  stochastic  semi-group  in  D'' .  Moreover 

\\Un\cHD^.D')  <  (3.17) 

where  0r  is  defined  by  (3.8). 


Remark.  Estimates  (3.16)  and  (3.16)  are  mere  restatements  of  estimates  (3.9)  and  (3.14)  re- 
spectively. 


From  the  perturbation  representation 


u:  =  +  '£[*  Pt-rBiV>  dY) 

i=i 

the  following  estimates  are  easily  derived 

mt  -  E,iUn\\cHH,H)  <  -  1)*^'  , 

(resp.  (|f7/  -  E(f/')||£,(DM5.)  <  (e^'<'-*)  -  l)’/^  )  , 
using  (3.3)  and  (3.7),  (3.16)  (resp.  (3.8),  (3.17)). 


(3.18) 
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4  Some  product  formulas 


The  purpose  of  this  Section  is  to  propose  and  study  time-discretization  schemes  for  equation 
(3.1). 

d 

dut  +  Aut  dt  =  ^2  BiUt  dYf  , 

(=1 

uo  =  h  - 

Remark  first  that  two  operators  are  involved  in  this  equation 

•  the  unbounded  operator  A,  with  a  deterministic  contribution, 

•  the  bounded  operators  (81,82,  •  •  •  ,8d)-,  with  a  stochastic  contribution. 


If  only  A  were  present  (i.e.  8\  =  82  =  ■■■  =  8j  =  0),  then  the  associated  semi-group  would  be 
(Pt  ■  t  >  0)  i.e.  the  semi-group  generated  by  -  A.  On  the  other  hand,  if  only  (81, 82,'  •  ■ ,  8d) 
were  present  (i.e.  A  =  0),  then  the  associated  two-parameter  semi-group  would  be 


4’*  =  exp 


j28i(Y/-Y:)-lj2Bfit-s) 


Li=l 


1=1 


(4.1) 


Therefore,  it  seems  natural  to  consider  the  following  numerical  scheme.  Let  x  :  0  =  <0  < 
<  •  •  •  <  ti  <  •  •  •  <  t„  =  T  be  a  given  partition  of  the  interval  [0,T’],  with  mesh-size  k.  The 
proposed  appro.ximation  of  Ut„  -  the  value  at  time  t„  of  the  solution  to  equation  (3.1)  -  is 
given  by  the  following  recursion 


“n+i  =  Un 

Uo  =  “ 


(4.2) 


This  is  nothing  but  a  Trotter-like  product  formula.  A  next  step  would  be  to  approximate  the 
deterministic  operator  P !n+l-»n  by  a  simple  and  computable  one,  involving  only  the  generator 
—A.  This  is  a  rather  standard  part,  for  which  some  possible  answers  are 


•  implicit  Euler  scheme, 

•  Crank-Nicholson  scheme. 


In  any  case,  some  desirable  properties  of  any  discretization  scheme  for  equation  (3.1)  should 
include 


•  decoupling  of  the  deterministic  part  and  the  (generally  straightforward  to  deal  with) 
stochastic  part. 


(within  the  context  of  nonlinear  filtering)  availability  of  a  probabilistic  interpretation. 


The  latter  issue  will  be  discussed  in  detail  in  another  paper.  In  concrete  situations,  the  former 
property  will  make  the  analysis  of  complete  discretization  schemes  (i.e.  including  an  additional 
discretisation  with  respect  to  a  “space  variable”)  quite  easy. 

The  analysis  made  in  the  next  subsection  wiU  show  that  the  rate  of  convergence  of  the  scheme 
(4.2)  is  of  order  0(k). 

Obviously,  the  proof  of  this  result  will  rely  on  Theorem  2.10.  Remark  just  that  the  discrete 
stochastic  semi-group  defined  by  the  family  (I/*^j  ^  t  =  0, 1,  •  ■  •)  already  satisfies  some  of  the 
hypotheses  of  Theorem  2.10.  To  prove  the  needed  estimates  on  the  one-step  error,  one  possible 
approach  would  be  to  use  a  stochastic  Taylor  formula  like  in  [6,9,10].  Such  a  stochastic  Taylor 
formula  would  indeed  not  be  diflficult  to  prove  in  the  context  of  bilinear  stochastic  PDE,  along 
the  lines  of  Theorem  2.1  in  Newton  [6,  pp.25-26].  However,  it  is  a  general  situation  in  the 
infinite-dimensional  setting,  that  Taylor  formulas  (either  deterministic  or  stochastic)  do  not 
provide  suitable  schemes.  In  particular,  they  are  explicit  schemes,  generally  unstable  because  of 
the  unboundedness  of  the  deterministic  operator  A. 

4.1  Rate  of  convergence  of  order  0{k) 

Generally  speaking,  if  8i  6  C{D^,D’')  (i  =  l,2,---,d)  for  some  integer  r,  then  (’i'f  :  0  < 
s  <  t)  is  a  strongly  continuous  stochastic  semi-group  in  .  Moreover 

E('Pf)  =  I ,  (4.3) 

•  (4.4) 

Theorem  4.1  Let  u  denote  the  unique  solution  of  equation  (3.1).  If  Bi  €  C{D^.,D^)  I  <  i  <  d 
and  u  €  D^,  then  the  rate  of  convergence  of  the  discretization  scheme  defined  by  (4-2)  is  of  order 
0(k),  i.e.  for  all  u  £ 

(E|u(tn)-UnP)‘^'<C’*‘.I:|Ql2  . 


Proof.  Consider  the  two  following  discrete  strong  stochastic  semi-groups  defined  by 


fji  “  ijt,  yi  ^  yt. 

where 

Vf  =  Pt.,  . 

It  follows  from  (4.3)  and  (4.4)  respectively  that 

E(F,*)  =  , 

(4.5) 

(4.6) 
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Now  (3.15)  and  (4.5)  imply 


E(i?)  =  0  . 

According  to  Theorem  2.10,  it  will  be  enough  to  get  some  estimates  on  the  one-step  error 
By  Ito  formula 

=  Pi->  +  flj^  +  E  • 

By  difference  with  (3.18) 

6t  ^  Ut  -v:  =  i\  t  Pt-r  Bi  6*  dv;  +  ^  /'  Pt-r  (Pr-s  Bi  -  Bi  Pr-s)  K  dY^  . 

.=1  •'*  i=l 

Therefore,  for  all  x  €  P 

<  2^E  /Vt-r  Bi6’x\‘^dT  +  2^E  f  \Pt-r  {Pr-,  Bi  -  Bi  Pr-»)  9*x\'^dr  .  (4.7) 

,=i  •=!  •'* 

The  next  step  is  to  get  some  estimate  on  (Pt  Bi  —  Bi  Pt).  Since  Bi  £  C{D^,D^)  1  <  »  <  d, 
it  follows  that  (A  Bi  -  Bi  A)  €  C{D^,H).  Introducing,  as  usual  in  perturbation  problems,  the 
operator  Pt-,  Bi  P,.  and  differentiating  with  respect  to  s,  gives  for  all  i  6  D* 

{Pt  Bi  -  Bi  Pt)x  =  f  Pt-s  {A  Bi-  Bi  A)  P,x  ds  . 

Jo 


Therefore 

where 


\{Pt  Bi  -  Bi  Pt)x\  <  Oi  t  |x(2  , 


a,  =  ||A  Bi  -  Bi  A\\c(o^,h)  • 

Combining  (4.7),  (4.4)  and  (4.1)  gives 

<  201 J*  Elii^xp  dr  +  2a^  J\t  -  sf  dr  \x\l  , 

where 

a=  (e«?) 

Gronwall’s  lemma  now  implies  that 

11^1  llcjio’.ff)  ^  ~ 


1/2 
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□ 

It  can  be  seen  from  the  proof  that  if  (A  B,  —  Bj  A)  €  C{D^,H),  then  less  regularity  is 
required  on  the  data.  Indeed 

Theorem  4.2  Let  u  denote  the  unique  solution  of  equation  (3.1).  Assume  that  Bi  €  C{D^,D^) 
1  <  i  <  d  and  ii£  D^.  If  in  addition 

{ABi-BiA)^C{D\H),  (t  =  l,...,d)  (4.8) 

then  the  rate  of  convergence  of  the  discretization  scheme  defined  by  (4-3)  is  of  order  0{k),  i.e. 
for  all  a  € 
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1.  Introduction 


In  this  paper  we  shall  prove  a  result  about  linear,  stochastic  partial  differential 
equations  and.  apply  it  to  the  question  of  exact,  finite-dimensional  recursive  compu¬ 
tation  of  optimal  filters.  Let  (KCt),  0  <  <  <  T}  be  an  iR'-valued  Brownian  mo¬ 
tion.  Throughout,  we  assume  that  Y  is  the  canonical  process  on  (f]l,.F,P),  where 
n  3s  {/  €  C([0,T];iR**),  /(O)  s  0},  T  is  the  o’-algebra  of  Borel  sets  of  f)  w.  r.  t.  sup 
norm  topology,  and  P  is  Wiener  measure.  On  define  the  operator 


a* 


>4  -  E  +»<*)• 


»ii»x 


dXidXj 


»=Sl 


dXi 


and  assume  that  a(z)  =  [at,i(c)]i<t,i<4  »  symmetric.  We  shall  consider  the  solution 
p(z,  t)  to  the  stochastic  p.  d.  e. 


dp(«,<)  =  Ap(*,<)dt  +  j2/ii(i)p(x,t/dy’*(t)  (1.1) 

tael 

p(-,0)  =  M')-  (1*2) 


Sometimes  we  shall  write  p(',  <|y)  to  emphasize  the  dependence  of  p  on  Y* .  Suppose 
that  a  set  of  linearly  independent  fimctions  {^i, . . . ,  C  is  given,  and  form 

the  random  vector  =s  ({^i,p(*,f|y)),...»(^n,J>(’>*l^)))  Here 

=:  J  ^(z)^(x)ds.  Our  main  result,  Theorem  1.1,  applies  the  stochastic  calculus 
of  variations,  or  Malliavin  calculus,  to  (1.1)  in  the  case  that  A  is  uniformly  elliptic 
‘partially  nppotted  by  USACCE  vadex  Contract  DAJA4S-87-M-029< 


and  all  the  coefficients  are  analytic  functions.  It  states  Lie  algebraic  conditions  under 
which  the  probability  distribution  of  #1**^  admits  a  density  with  respect  to  Lebesgue 
measure  on  JZ"  for  any  n.  This  result  is  a  refinement  of  a  similar  theorem  proved 
in  Ocone[18],  in  which  the  coefficients  are  assumed  to  be  only  infinitely  difierentiable, 
but  the  initial  condition  p(‘|0)  is  assumed  to  be  smooth.  Introducing  the  analytidty 
coxidition  not  only  allows  non-smooth  initial  conditions,  but  also  leads  to  a  simpler  Lie 
algebraic  criterion  that  is  easier  to  check. 

To  state  Theorem  1.1  we  introduce  the  following  notation.  Let  A  denote  the  Lie 
algebra  of  operators  generated  hj  A  =  A  —  1/2  and  (multiplication  by)  hj(*), 

1  <  t  <  p,  using  the  Lie  bracket  [J?,  C\  =  CoB  —  BoC.  The  elements  of  A  are  aU 
partial  differential  operators  with  variable  coefficients.  For  xo  6  let  A(zo)  denote 
the  linear  space  of  operators  consisting  of  the  operators  of  A  with  their  coefficients 
frozen  at  xo;  thus,  for  example,  x^  €  A  and  xo  ^  0  imply  ^  €  A(xo).  Also,  given 
/  €  {(B/)(xo),  B  e  A}  =  {(C/)(xo),  C  6  A(xo)}.  Finally,  let 

denote  the  space  of  real  valued,  bounded  functions  on  JR*  which  are  analytic  at  each 
point  of  IR^  and  whose  derivatives  of  all  orders  are  bounded.  Also,  let  H^{]R*)  be  the 
Sobolev  space  of  (integral)  order  k  with  norm  ||/|| 

Theorem  1.1  .  Assume  that 


a(x)  >  tl  for  some  e>  0  and  all  x  £  JR*, 

c(-).  ^»(')  €  Cj;'  for  1  <  i,;  <  d,  1  <  fc  <  p, 

dl«l 

6  A(xo)  for  every  multi-index  a. 

1  *  *  *  d 


(1.3) 

(1.4) 

(1.5) 


Then  for  any  i  >  0,  for  any  n,  for  any  linearly  independent  set  {^i , . . . ,  <f>n}  C  L*{Sl*), 
the  probability  distribution  of  admits  a  density  with  respect  to  Lebesgue  measure. 

Section  2  of  this  paper  gives  the  proof  of  Theorem  1.1. 

Remnrk.  Because  of  (1.3)  and  (1.4),  equation  (1.1)-(1.2)  has  a  unique,  adapted  solution 

satisfying  ^ 

E  f  ||p(t)||a  dt  <oo  fox  k  <  —d/2  and  T  >  0. 

Jo 

Moreover,  p(’,<)  G  C([0,  T];  JI*(ffZ^))  a.  s.  for  all  k,  and  p(-,t)  6  C“*(ff2^)  for  all 
<  >  0  a.  8.  These  facts  are  proved  in  Pardoux[21],  see  especially  pp.227-228.  For  this 


reason,  “  well-defined  for  any  ^  €  U*<o  The  reasoning  used  to 

prove  Theorem  1.1  can  easily  be  extended  to  show  that  the  theorem  is  still  true  if  the 
condition  C  is  replaced  by  . . . , c  5‘(JR^)  for  any  ib  <  0. 

We  can  apply  Theorem  1.1  to  the  nonlinear  filtering  problem 

dX{i)  =  b(Xii))  dt  +  aiX{i))  dW(0.  X{0)  =  *0  (1.6) 

dy(o  =  hix{t))  dt  +  dB(t),  y(o)  =  0  (1.7) 

where  W  and  B  are,  respectively,  iR'  and  iR'-valued  Brownian  motions,  Jir(t)  evolves 
in  JR*,  and  Y{i)  evolves  in  JRf.  Let  o(i)  =  <rtr^(x).  In  compliance  with  (1.3)  and  (1.4), 
we  shall  assume 

a(x)  >  el,  and  bi,  h*  6  Cj^.  (1.8) 

Let  p(-,t|y‘)  denote  the  solution  to 

p 

dp(x,  t)  =  Aop(*,  1)  dt  -h  52  0  (1.9) 

p(-,o)  =  M-) 

I 

where  i4oii(*)  =  1/2^  “  E  sl7(^(»)«(»)).  unnor- 

xnalised  conditional  density  of  X(f)  given  the  sigma  algebra  =  <»^{y(«),  s  <  t}.  In 
(l.6)-(1.7)  Y  is  not  a  Brownian  motion,  but  the  measure  induced  by  y  on  fl  is  abso¬ 
lutely  continuous  with  respect  to  Wiener  measure.  Hence  the  conclusion  of  Theorem 
1.1  will  not  be  afifected  when  we  apply  it  to  (1.9). 

Recently  there  has  been  interest  in  determining  when  conditional  statistics,  such  as 
(^iP(*>0)i  admit  finite  dimensional,  recursive  realizations,  and  Theorem 

1.1  has  implications  for  this  question.  We  shall  say  that  the  collection  of  statistics 
{(^•«P('>0)t  1  S  *  <  oo}  admits  a  finite  dimensional,  regular  sufficient  statistic  a,  if 
a  ;  n  — »  ilf  is  a  measurable  map  into  a  finite  dimensional,  C*-manifold,  such  that,  for 
each  i,  there  is  a  €  C*(Af;  iR)  with 


ir(-,<|y) 


pM]Q_ 

fp(x,HY)dx 


Let 


denote  the  normalised  conditional  density  of  jr(t)  ^ven  We  shall  say  that 


1  <  »  <  00,  f  >  0} 

admits  a  fixiite  dimensional,  regular,  recursive,  sufficient  statistic  if  for  each  t  there  is  a 
9i  €  with  »(•,<))  =  dj(a(f))  where 

«<«(*) = /(»(*))  * + E  »'(“(*))  ('i”) 

1 

for  some  C* -vector  fields  /  and  gi,l<i<p  on  M. 

Corollary  1.2.  Assume  (1.8)  and  let  A  be  the  Lie  algebra  generated  by  Ao  — 
and  hi,...,hp.  If  A  satisfies  there  is  no  countably  infinite,  linearly  independent 
set  {^j,  1  <  i  <  oo}  C  L*{]R*)  such  that  either  {(^i,p(*,t)),  1  <  t  <  oo}  admits  a  £nite 
dimensional,  regular  sufficient  statistic  fi>r  any  t  >  0,  or  {(^{,ir(>,t)},  1  <  t  <  oo,  t  >  0} 
admits  a  £nite  dimensional,  regular,  recursive  sufficient  statistic. 

Proof.  The  conclusion  concerning  {{^i,  w(',t)),  1  <  t  <  oo,  t  >  0}  follows  from  that 
about  {{^«, !>(•,<)),  1  <  »  <  oo}  by  the  identity 

p{;t\Y)  =  r{;t\Y)exp[£ h<(s)dy*(s)  -  1/2  J*  |h(s)|»  ds]. 

where  hi(s)  =  /  h{x)ir{z,t)  dx.  To  prove  the  result  about  {(^<,p(",t)},  1  <  t  <  oo), 
let  us  assume  that  a  finite  dimensional,  regular  statistic  exists  and  derive  a  contra¬ 
diction  to  Theorem  1.1.  Existence  of  such  an  a  implies  that  for  any  n,  = 
((^i,p(-,t)),...,(^niP(‘<f)))  =  >^n(o[))*  However,  because  the  are  differ¬ 

entiable,  if  n  >  dimhf,  the  Lebesgue  outer  meuure  in  JR"  of  {(9i(m),...,d(m))  |  m  € 
M}  is  zero.  Thus  can  not  admit  a  probability  density  in  contradiction  to  Theorem 
1.1. 

1  fi* 

A  simple  example  in  which  (1.3)-(1.5)  hold  is  A  =  r  M*)  =  cosx,  and 
xq  ^  {nw,(2n  -h  l)w/2,  n  €  X}. 

Recently,  both  Lie  algebraic  techniques  and  the  Malliavin  calculus  have  been  in¬ 
creasingly  used  in  nonlinear  filtering  theory,  and  we  wish  to  compare  these  applications 
with  Corollary  1.2.  Brockett  and  Clark[4]  and  Mitter[16]  introduced  Lie  algebraic  and 


geometric  methods  into  filtering  with  the  insight  that,  in  fonnal  analogy  to  realization 
theory  in  differential  geometric  control,  existence  of  finite  dimensional,  recursive  filters 
should  impose  restrictions  on  the  structure  of  A.  This  inspired  a  lot  of  work  into  the 
classification  of  the  algebras  A  associated  to  filtering  problems  and  into  using  algebraic 
properties  to  seek  finite  dimensionally  computable  optimal  filters.  A  nice  survey  of 
this  effort  and  related  topics  may  be  found  in  Marcus[13].  Also,  Chaleyat-Maurel  and 
Michel[6]  and,  independently,  Hijab[10]  rigorously  developed  the  oripnsl  suggestions  of 
Brockett,  Clark,  and  Mitter.  For  example,  in  [6]  Chaleyat-Maurel  and  Michel  introdtxce 
the  following  notion  of  universsd  finite  dimensional  computability,  which  we  describe 
only  roughly  and  in  modified  form.  {p(*,t)t  t  >  0}  (or  >  0})  is  universally 

FDC  with  respect  to  a  class  of  infinitely  differentiable  test  functions  5  if  there  is  a 
system  (1.10)  with  C~-vector  fields  such  that  for  every  ^  €  5  there  is  a  0  €  C^{M) 
with  ^(a(t))  =  (^(<>(^))  =  By  comparing  the  Ito  derivatives  of 

d(a(t))  and  (^,p(*i0)  et  t  =  0,  one  can  derive  a  relationship  between  A  and  the  Lie 
algebra  of  vector  fields  on  M  generated  by  /,  , . . . ,  p,,  as  long  as  5  is  large  enough. 

For  an  appropriate  choice  of  5,  say  all  infinitely  differentiable  ^  so  that  (^,p(*,t))  and 
all  its  Ito  derivatives  make  sense,  it  is  shown  in  [6]  that  dimA(xo)  <  dimAf.  By  way 
of  contrast.  Corollary  1.2  draws  inferences  about  finite  dimensional  computability  from 
existence  of  probability  densities  for  for  any  n.  This  is  apriori  a  much  stronger 
property  and  requires  the  stronger  condition  (1.5)  on  A(zo)>  However,  we  are  able  to 
weaken  the  differentiability  requirements  on  ^  and  in  the  definition  of  finite  dimensional 
computability. 

Other  applications  of  Malliavin  calcultis  to  filtering  may  be  found  in  the  work  of 
Michel[14],  Bismut  and  Michel[2],  Ferre3rra[8],  and  Kusuoka  and  Stroock[ll].  These 
authors  study  the  existence  and  smoothness  of  p(c,  f)  as  a  function  of  x.  That  is,  they 
determine  when  the  unnormalized  conditional  distribution,  as  a  random  measure  on 

admits  a  smooth  density  p(x,t).  In  this  paper,  we  are  using  Malliavin’s  calculus 
to  study  the  measure  induced  on  a  function  space  by  the  solution  of  Zakai’s  equation. 
On  the  other  hand,  our  work  is  related  to  that  of  Chaleyat-Maurel[5],  who  studies 
continuity  of  nonlinear  filters  using  Malliavin  calculus.  She  gives  conditions  under 
which  conditional  statistics  are  in  the  Sobolev  spaces  on  Wiener  space,  but  does  not 


analyze  the  Malliavin  covariance  matrix  as  we  do  here. 


2.  Proof  of  Theorem  1.1. 


Onx  proof  relies  heavily  on  the  analysis  of  [18],  which  we  shall  use  without  repeating 
proofii.  For  simplicity  of  calculation,  we  assume  throughout  that  p  =  1. 

We  first  need  to  define  the  gradient  operator  D  on  Wiener  functionals.  Let  F  € 
L^(Q,P).  It  admits  an  Ito- Wiener  expansion 


h=0 

where  each  /*  6  L*([0,  T]*),  which  is  the  subspace  of  symmetric  functions  in  L*([0,  T]*), 

and  where  fk  O  is  the  multiple  Wiener  integral 

T*  7 

fkQY‘‘=  f  ...  f  fkiiu...,ik)dY{U)...dY{U). 

Jo  Jo 

Let  denote  the  set  of  F  G  L*((l,P)  satisfying 


(2-1) 

1 


If  F*  €  ID*'*,  we  may  define 

QO 

D,F(r)  =  53 */*(..., s)oy^\  0<s<r,  (2.2) 

1 

where  /*(. . . ,  s)  is  the  element  of  i*([0,  T]*"*)  obtained  by  fixing  the  last  variable  at  s. 
Because  of  (2.1),  the  series  on  the  right  hand  side  converges  in  £’((1  x  [0,T],F  x  m), 
where  m  denotes  Lebesgue  measure  on  [0,T],  and  thus  D,F{Y)  is  well  defined  up  to 
sets  of  P  X  m-measure  zero.  In  fact. 


£1  f  {D.F)’  d,]  =  f;  i(i!)||A|li.. 
Jt  'T' 


Next,  given  F  =  fP, , . . . ,  P„>  €  (iO*’’ )",  we  define  the  Malliavin  covariance  matrix 
of  P: 


V^PVP  =  [jj  D,F'D,Fi  ds], (2.3) 

Let  P  o  P~^  denote  the  probability  distribution  of  P;  for  a  Borel  set  A  C  .fiZ”,  P  o 
F~^{A)  =  P(P  €  A).  The  Malliavin  covariance  matrix  is  used  to  study  the  regularity 
properties  of  P  o  P~*.  For  example,  Bouleau  and  Hirsch[3]  prove  the  following  result. 


Proposition  2.1.  Suppose  that  F  €  sad 


>  0  a.s.  (2.4) 

Tiien  P  o  F~^  is  absolutely  continuous  with  respect  to  Lebesgue  measure  on  SP*. 

Proposition  2.1  presents  the  simplest  application  of  the  MaUiavin  covwance  ma¬ 
trix.  The  theory  of  the  Malliarin  calculus  shows  that  moment  bounds  on  the  inverse 
of  imply  smoothness  properties  of  the  density  d{P  o  F~^)/dx.  For  an  in¬ 

troduction  to  the  complete  theory  and  its  applications,  see  Ocone[20]  or  Michel  and 
Pardoux[15]. 

We  shall  use  Proposition  2.1  with  F  =  prove  Theorem  1.1.  Notice  that 

is  adapted  to  <r{Y{s),  s  <  t}.  Therefore,  we  may  replace  T  by  t  in  (2.3)  in  discussing 
and  so  for  the  rest  of  the  argument  we  assume  T  =  t. 

Before  continuing,  we  note  that  it  suffices  to  consider  operators  A  in  (1.1)  of  the 
form  that  appear  in  Zalcai’s  equation  (1.9)  modulo  a  potential  term: 

^«(*)  =  5^  -  c(*)«(*)  (2.5) 

i,i  *  ’  i  * 

where  o‘(a;)  :  JR*  -*  SI***.  This  is  possible  because  for  any  a(x)  =:  [ai,i(x)]  satisfying 
(1.3)  and  (1.4)  there  is  a  ir{x)  €  satisfying  a(x)  =  0’<7’^(x).  For  example,  following 
Friedinan[9],  pp.  128-129,  we  may  take  <r(x)  =  (l/2ir)  /p  v^(a(x)  —  r/)“*  dz,  where  F 
is  a  simple  closed  curve  in  Si  z  >  0  containing  all  the  eigenvalues  of  a(x)  for  all  x  G  JR^. 
Thus,  by  suitably  choosing  i  and  c  we  can  transform  any  operator  of  the  form  in  (1.1) 
to  the  form  in  (2.5).  The  advantage  of  (2.5)  is  that  A-j-  c  is  the  forward  generator  of 
the  diffusion  associated  to 

dX(t)  =  6(.X’(0)  +  <r(X(t))  dW(t),  (2.6) 

and  we  can  then  represent  the  solution  to  (1.1)  with  the  Kalliuptir-Striebel  formula 
firom  nonlinear  filtering.  Suppose  that  W,  and  hence  X  are  defined  on  a  second  prob¬ 
ability  space  Extend  W,  X,  and  the  canonical  process  Y  on  (n,P)  to 

the  product  probability  space  (Q  x  fi',F  x  F'tP  x  Q)  by  W(a;,  «»;')(<)  =  W(u>')(t), 


=  Y((i>){i)f  etc.  Let  Eq  denote  expectation  with  respect  to  Q  on  iV,  let 
X,a(t)  be  the  solution  of  (2.6)  with  Xao(O)  =  co  nnd  set 

MX,  t)  =  ocpijf  ‘  h(x„(,)) 

=  «p(Mx..(())r(i)  -  j'  r(.)V(x..(.))  dx„[,) 

- 1/2  jf'wx..(.))’  +  n*)te(«(X..(.))V'(X..(.)))l  i4 

Let  pC'jflY’)  solve  (1.1)-{1.2)  with  A  given  by  (2.5).  Then  the  Kallianpur-Striebel 
formula,  modified  by  the  potential  e  ^ves 

=  forPa.e.y.  (2.7) 


(2.7)  is  proved  in  Pardoux{21]  with  e  =  0,  and  the  method  extends  easily  to  non-zero 
c.  The  right  hand  side  of  (2.7)  is  well  defined  for  every  T  €  O,  and  we  always  use  this 
particular  version  of  (^,p(‘,t|y)). 

We  are  now  in  a  position  to  calculate  the  gradient  of  (^,p(*,t|y))  using  some 
nonlinear  filtering  theory. 

Lemma  2.2.  Under  the  assumptions  (L4),  if  £  It’(2R^),  (^,p(-,fiy))  €  ID*’* 

u.d  -  ii.,,|(.)£<,Wx..(())Mx..(.))«p{-Ji‘c(x..(.))<i.}£(r,<)l. 

It  follows  that 

P(-,  du  =  I*  EQ[4^iX.,it))h{X,M)e-  So  t)]»  du. 

Jo  VO 

(2.8) 

Proof.  By  Theorem  3.1  in  [17]  (^»p(*,<|y))  =  S*  fu  O  y*»  where 

Mu . <0  =  ix,  Wx..(()).'-C‘‘*-''»"Mx..((,))...Mx.,(<0)l- 

We  shall  show  that  (0,p(-,<iy))  €  iD*’*  by  verifying  (2.1).  If  ^  €  L’(iR'*),  Cauchy- 
Schwarz  implies  £fQl^(X,o(<))|  <  ||^||l*I|9(<)II£»  where  q{x,  t)  is  the  density  of  jr,o(t). 
However, ;(«,  t)  solves 

^  =■  v2e  fc!^**^**’’* " ? 


and,  by  the  aaalyus  of  Patdoux[21],  pp  227-228,  ||f(<)i|£i  <  oo  for  <  >  0.  Thus,  if 
M  =  tup,  |c(x)|  and  K  =  sup,  |h(x)|, 

trk^ 

IIAIll.  <  e"‘^WIIl.ll5Wlll.. 

It  follows  easily  that  {^,p('>^l^))  € 

Next,  by  (2.2),  Z},(^,p(*,<|K))  =  ••.«)©  for  u  <  <,  where 

gkiu , . . . ,  <a-i.  ti)  =  Eg  [^(.r.,(0)e'  )) .  •  •  KX,,{tkmX„(u))] . 

By  again  applying  Theorem  3.1  in  [17],  this  time  in  reverse,  we  use  these  expressions 
for  gk  to  get  the  result  of  the  Lemma. 

We  shall  next  represent  the  integrand  of  (2.8)  in  terms  of  an  product  between 
p  and  the  solution  of  a  backward  stochastic  p.  d.  e.  adjoint  to  (1.1).  This  will  then  put 
us  exactly  in  the  situation  of  [18],  and  we  will  introduce  a  lemma  from  [18]  to  complete 
the  prooL  Let 
w(*,r|y;^)  = 

Following  the  analysis  of  Pardoux[21],  Chapter  m, 

u(*,  r|y,  4>)  =  e-'^-)^(")u(x,  r|r,  ^)  (2.9) 

where 

=  -«)•«(., r|y,*)  r<t 
u(x,<)  =: 

and  is  the  formal  adjoint  of  ((2.9)-(2.10)  is 

the  robust  form  of  what  appears  in  Pardoux[21],  except  that  in  [20]  there  is  no  potential 
e.)  Now  from  the  Markov  property  of  X^^ 

Bq  Wx..(<)).-  <)]  =  «•.  r|r. *(•)?(•.  rlX )). 

Thu.  b,  (2.8),  VT(*,,<.,<|y))V(*,,<.,(|r))  =  /,'(v(-,r|r, «,*(■)]<., rir))'*.  Thi. 
is  precisely  the  type  of  formula  encountered  in  the  analysis  in  Ocone[18]  which  was 
applied  to  (1.1)  without  the  added  assumption  of  analyticity.  The  analysis  leading  to 
Lemma  4.18  in  [18]  may  be  repeated  with  only  very  minor  modification  to  obtain. 


Lemnun  3.3.  Then  if  a  sei  C  O  with  P(^^)  —  0,  such  thet  £ar  every  Y  ^  and 
every  non-sexo  ^ 

J*{vMY,^)M-)p(-.r\Y))Ur  =  0 

implies  (t>(*,r|y,^),Cp(*,r|y‘))  =  0  for  0  <  r  <  <  for  every  C  €  A. 

Remsrk.  The  proof  of  Lemma  2.2  invoWea  repeatedly  differentiating  the  integrand  with 
respect  to  r.  Care  must  be  exeerdsed  since  o(*,r|y’,^)  is  adapted  to  the  future  and 
p(-,<|y’)  to  the  past  of  Y. 

Proof  of  Theorem  1.1,  By  Proposition  2.1  it  suffices  to  show  that  ^ 

a.  s.,  where  Equivalently,  we  must  show  that 

>  0,  for  all  non>zero  f  6  iR",  a.  s. 

However,  since  =  V^(5:7^i^i,p(.,t))V(5:7^i^<,p(.,f)),  it  is  enough 

to  show 


V^(^.p(*.<|y)>V(^,p(-,<|y))  >0  V  ^  e  ^  0  and  V  y  6  AT',  (2.11) 


where  is  the  set  found  in  Lemma  2.2,  because  P{//^)  =  0.  Suppose  to  the  contrary 
that  for  some  ^  #  0  and  some  Y  €  AT®,  V^(^,p(*,fiy))V(^,p(*,<iy))  =  0.  Then  by 
Lemmas  2.2  and  2.3,  (v(-,riy,0),Cp(‘,riy))  =  0,  0  <  r  <  t,  for  all  C  €  A.  Recall  that 
p(*,r(y)  €  for  all  Jb.  Similarly,  by  applying  the  analysis  of  Pardoux[2l],  pp. 

227-228,  to  the  equation  (2.10)  for  «,  one  finds  that  v(-,r|y,^)  E  H’'{]R*)  for  all  ib. 
Therefore,  we  may  integrate  by  parts  to  find  that  {C*v(’,r|y,^),p(-,riy)),  0  <  r  <  f, 
for  all  C  €  A.  Taking  r  i  0  and  using  (1.2),  Cv{xo,o\Y,(p)  =  0  for  every  C  €  AJ^, 
where  A^,  is  the  space  of  operators  formed  by  freezing  the  coefficients  of  the  operators 
in  A*  =  {C*,  C  €  A}  at  xq.  It  is  clear  that  if  A  has  property  (1.5),  then  so  does  A*. 
Hence 


d9^^...dxy 


v(x,o|y,^) 


=  0  for  all  multi-indices  a. 


(2.12) 


aaao 


By  applying  Theorem  6.2,  p.  221,  in  Eidel'man[7]  to  (2.10),  one  finds  that  v(*,  0|y,  ^)  € 
C^.  Hence  (2.12)  implies  that  o(',0|y,^)  s  0  and  hence  u(‘,0|y,^)  =  0.  But  u  satisfies 
the  backward  parabolic  p.  d.  e.  (2.10),  and  (A^)*  is  uniformly  elliptic  because  of  (1.3). 


Theorem  II.l  of  Bardos  and  Tatat[l]  on  backward  uniqueness  of  evolution  equations 
therefore  applies  and  shows  that  u(x,t|y’,^)  =  =  0  also.  This  contradicts 

the  initial  assumption  that  ^  7^  0,  and  completes  the  proof. 
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